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ABSTRACT 

"Approximate Bayesian Computation" (ABC) represents a powerful methodology for 
the analysis of complex stochastic systems for which the likelihood of the observed 
data under an arbitrary set of input parameters may be entirely intractable — the 
latter condition rendering useless the standard machinery of tractable likelihood-based, 
Bayesian statistical inference (e.g. conventional Markov Chain Monte Carlo simulation; 
MCMC). In this article we demonstrate the potential of ABC for astronomical model 
analysis by application to a case study in the morphological transformation of high 
redshift galaxies. To this end we develop, first, a stochastic model for the competing 
processes of merging and secular evolution in the early Universe; and second, through 
an ABC-based comparison against the observed demographics of the first generation 
of massive {M m \ > 10 n M o ) galaxies (at 1.5 < z < 3) in the CANDELS/EGS dataset 
we derive posterior probability densities for the key parameters of this model. The 
"Sequential Monte Carlo" (SMC) implementation of ABC exhibited herein, featuring 
both a self-generating target sequence and self-refining MCMC kernel, is amongst the 
most efficient of contemporary approaches to this important statistical algorithm. We 
highlight as well through our chosen case study the value of careful summary statistic 
selection, and demonstrate two modern strategies for assessment and optimisation 
in this regard. Ultimately, our ABC analysis of the high redshift morphological mix 
returns tight constraints on the evolving merger rate in the early Universe and reveals 
merging, rather than secular evolution, as the most important mechanism for building 
up the first generation of bulges in early-type disks. 
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1 INTRODUCTION 

With origins in population gen e tics and evolut i onar 



Witn origins m p opulation genetics ana ev olutionary 
biology (e.g. iTavare et al l Il997l; iPritchard et all 1 19991 : 
iBeaumont et al.ll2002l ; and see Tcsillerv et al.ll201(j for a re- 
cent review) Approximate Bayesian Computation (ABC) of- 
fers a powerful technique for recovering posterior probabil- 
ity densities from complex stochastic models for which the 
likelihood of the observed data under a given set of input 
parameters may be entirely intractable. Examples include 
the estimation of time to the most recent common ances- 
tor under the coalescent model with re combination given a 
full s uite of modern DNA sequencing (|Marioram fc Tavarel 
2006), or the derivation of transition probabilities in contin- 
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uous time Markov models of mac roparasite population evo - 
lution from simple demographics (|Drovandi fc Pet titt 2010). 
However, although there exist a variety of important astro- 
physical models with inherently intractable likelihoods, to- 
date applications of ABC in this field remain surprisingly 
rare0 The only indispensable ingredients required for ABC 



1 Indeed the authors can find no astronomical reference to either 
the terms "approximate Bayesian computation" or "likelihood- 
free" (inference) on the NASA ADS database; and Google Scholar 
indicates no astronomical citations yet to any of the biologi- 
cal/mathematical ABC literature mentioned herein. The power- 
point slides from a talk by Chad Schafer at the Statistical Chal- 
lenges in Modern Astronomy V conference in 2011 available at 
URL ( http : / / astrostatistics . psu.edu /sullscma5/ scma5 . html ) de- 
tailing two potential uses for ABC in extra-galactic data analysis 
represent to our knowledge the only prior application in this field. 
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are: (i) a stochastic model for the observed data, replicating 
the behaviour of all random processes driving the system 
at hand as well as any relevant observational errors; and 
(11) a discrepancy measure, based typically on a set of low 
order summary statistics, to quantitatively gauge similarity 
between output from this model and the empirical bench- 
mark. 

One potentially valuable role for ABC in an astronom- 
ical context may thus be in the constraint of se mi-analytic 
mode l s (SAMs) of ga l axy formation (cf. ICole et al.l 
200d; iBenson et alj 120031; iBaugbl |2006l: iBower et al l l200d ; 



De Lucia et all l20ld ; iNeistein fc Weinmannl |201Ch 
which the output at run-time necessarily exhibits complex 
stochasticity owing to the effects of cosmic variance (in- 
duced computationally vi a sampling from within large-scale 
dark matter simulations, ISpringel et al.l 1200 ll ; iKnebe et al.l 
l201ll ; or via Monte Carlo construction of halo m erger 
trees, lLacev fc Coll 1 19931 ; iParkinson etafl 120081 ). For 
ABC analysis of such codes an appropriate discrepancy 
parameter might then be the metric distance between sim- 
ulated and observed luminosity functions under a sensible 
binning scheme0 In the (perhaps-not-so) disparate fields of 
Active Galactic Nucleus (AGN) power spectrum analysi s 
(|Uttlev et al.ll2002l;llngram fc Donell201ll;lKelly et al.ll201ll), 
aster oseismology (jKieldsen fc Bedding! 12004; iHekker et al 



2011), and initial mass fun ction profiling (jFumagalli et al 



2011 ; |Pa Silva et al.l 1201 ll ) one also routinely encounters 
models from which random simulations can readily be 
made, but for which the corresponding likelihood functions 
(featuring correlated noise or population clustering) are in- 
herently intractable. Where conditions (1) and (11) above are 
thus satisfied ABC of fers an easily-implemen t ed, theoreti- 
cally well-established jNunes fc Balding! l20ld ; iMarin et all 
l201ll ; iFearnhead fc Pranglei l2oTi T alternative to the com- 
putationally intensive "approximate likelihood" approach 
(requi r ing v e ry large scale s i mulat i on/re-simulation , e.g. 
IWoodl l20ld; IHenriques et all 120091 ; iLu et all l201ll ; and 
note IBenson et al.l l201ll regarding the required diversity of 
merger trees sampled for genuine convergence of SAMs), or 
the user-intensive application of model emulators (requiring 
a non-trivial degree of run-tim e supervision and operator 
expertise, cf. IBower et al.i 2010 and references therein) 



2 For readers familiar with the work of IBower et al. wo 
note that the "discrepancy parameter" introduced for their emu- 
lation of the GALFORM SAM could not be employed as such in 
ABC as it is not in fact a gauge of model-data similarity; indeed 
it serves an entirely different purpose in their analysis, acting as 
an error term for cosmic variance and structural uncertainty in 
their code. 

3 As a caveat t o the above referencing we note that: (1) th ough 
the analyses of IHenriques et all ||2009| ) and ILu et all j201lT ) are 
both conducted broadly in th e style of the "approximate likeli- 
hood" approach formalised by|Wood (20lrij) their specific imple- 
mentations contain certain ad hoc features of dubious statistical 
merit (e.g., the neglect of cosmic variance uncertainty and cor- 
relation between bins when computing the luminosity function 
likelihood in the former, and in the latter the arbitrary three- 
fold inflation of the observational error bars to account for un- 
certainty in these uncerta inties); and (11) though the work of 
Kampakoglou et al. (2008) has in previous papers been cited as 
an example of MCMC-based SAM constraint, in fact, their study 
concerns a purely analytic model for which there exists no in- 



In this paper we illustrate heuristically the power of 
ABC for astronomical model analysis through application 
to a case study in the morphological transformation of mas- 
sive galaxies at high redshift. In particular, we demonstrate a 
contemporary Sequentia l Monte Carlo (SMC) formulation of 
the ABC algorithm (cf. iDel Moral et al.ll200o1 ; ISisson et al.l 
l2007l ; lDrovandi fc Pettittl20ld ). as well as a regression-based 
procedure for constructing an optimal summary statistic- 
discrepancy measure pairing for the pu rpose of parameter 
estimation |Fearnhead fc Pranglei 120121 ). Importantly, the 
stochastic model we explore herein features both an "in- 
dependent evolution" case for which the likelihood is in fact 
tractable and a "co-evolution" case for which it is not — the 
former allowing the reliability of our ABC-based solution 
to be established against conventional Markov Chain Monte 
Carlo (MCMC) simulation and the latter a demonstration 
of the unique possibilities of ABC analysis. 

Installation of the new Wide-Field Camera 3 (WFC3) 
on the Hubble Space Telescope (HST) in 2009— and the 
subsequent allocation of vast amounts of observing time 
to deep, near-infrared (NIR) surveys with this instru- 
ment, including the E arly Release Science program (ERS; 
IWindhorst et akll201ll ) and the Cosmic Asse mbly Near-IR 
Deep Extragalatic Legacy Sur vey (CANDELS; lGrogin et all 
120111 ; iKoekemoer" et al.l l201ll ) — has at last made accessi- 
ble (at high resolution) the rest-frame optical morpholo- 
gies of distant galaxies at the epoch of p eak cosmic 
star formation and AGN activity (z ~ 2; iLillv et al 



199d; iMadau et ail Il99d ; lOesch et al] 120121 : IWarren et a! 



1994 ). Early studies exploiting these new datasets have 



documente d the emergenc e of th e first Hubble sequence 
analogues dCameron et all l2011bl ; IConselice et alj l2011al ; 
ISzomoru et al.l l2011al ), demo nstrated the compac t ness of 
the first massive sp heroids |Szomoru et al l l20ld , l2011bl ; 
iNewman et aljfeoill ). explored the unique characteristics o f 
galaxies ultraluminous at infrared (iKartaltepe et al.ll201ll) 
and X-ray wavelengths dKocevski et al. 20121 ; Rosario et alj 
l201ll ; ISchawinski et al.ll201lj ). and probed structural trans- 
formation in extreme cluster environments l|Lotz et al.ll201ll ; 
iPapovich et al.ll201ll ). Thus far, however, there have been 
remarkably few studies to exploit the full potential of de- 
mographic analysis for constraining pathways of galaxy 
evolution — one early exemplar being Bell et al.'s (2011) 
search for correlations between the global observables of key 
galaxy sub-populations in the CANDELS dataset divided 
coarsely by rest-frame optical morpholo gical type (via the 
usual proxy of global Sersic index; c f. Driver et alj l200d ; 
ICameron fc Driverll2009l ; lKelvin et al.ll201ll ). Hence, we have 
chosen here specifically for our exposition of the ABC tech- 
nique a case study in the demographic analysis of WFC3 
data in the hope of motivating further research in this di- 
rection. 



trinsic stochasticity (thus, only approximate observational errors 
enter their like lihood computation ) . Finally, we refer the inter- 
ested reader to lHartig et alj ll201ll ) for a concise overview of the 
similarities and differences between the "approximate likelihood" 
an d ABC app r oache s to inference from statistical simulation, and 
to iNott et alj II2011T ) for an advanced treatment of the link be- 
twe en a particular version of ABC and the Bayes Linear technique 
(cf. iGoldstein fc Woofl 2007) underlying the model emulator ap- 
proach. 
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The structure of this paper is as follows. In Section[2]we 
review the publicly-available source catalogues and images 
comprising our high redshift demographic benchmark, then 
in Section[3]we present the core of our case study in ABC for 
astronomical model analysis. First, we describe our model 
for galaxy evolution and our procedure for stochastic simu- 
lation from this model fSection l3.1[) . Second, we explain the 
ABC algorithm and the SMC approach to its implementa- 
tion (Section 13. 2p . Third, we examine in depth the impor- 
tant process of constructing an optimal summary statistic- 
discrepancy parameter pairing (Section l3.3[) . And fourth, we 
confirm the similarity between our ABC and MCMC pos- 
teriors in the tractable "independent evolution" case, and 
present our final ABC-only posteriors for the more realis- 
tic, but likelihood intractable, "co-evolution" case (Section 
I3.4[) . In Section[4]we conclude this paper with a discussion of 
the implications of the model constraints so derived for as- 
trophysical theories of morphological transformation in the 
early Universe. 

We have thus attempted to organise our exposition of 
ABC in such a manner as to allow astronomers interested 
in this important statistical algorithm but not working di- 
rectly in the area of galaxy evolution to optionally skip over 
the technical details and justification of our model (Section 
13. ip without disadvantage (instead reading only Sections l3.2l 
13.31 and !3.4l in depth). All magnitudes are quoted in the AB 
system and a standard {Qm = 0.3, JIa = 0.7, h = 0.7} cos- 
mological model is adopted throughout. 



2 DATA 

Featuring a vast ensemble of multiwavelength imaging com- 
piled from both ground-based and space-based observatories 
the Extended Groth Strip (ECS) region of the Northern Sky 
(centred on RA: U h 17 m , Dec: +52°30') numbers amongst 
the premier legacy survey fields of the modern era. The All- 
waveleng th Extended Grot h strip International Survey team 
rAEGIS; |Pavis et al ,1 120071 ) has been responsible for the bulk 
of this data collection through extensive observational cam- 
paigns with HST and Spitzer. Such a comprehensive set of 
photometric measurements greatly facilitates the estimation 
of redshifts and stellar masses via Spectral Energy Distri- 
bution (SED) template fitting, and there exist a number 
of published studies characterising the high redshift galaxy 
population in the EGS to this effect. 

In this study we employ the publicly-available and up- 
to-date, ultra-violet- ( U V)-to - far- infrared- (FIR)-based cata- 
logue of iBarro et al.l (|201 lal lbh [Bll hereafter] to identify a 
complete sample of high mass (Mgai > lO n M ), early Uni- 
verse (1.5 < z < 3) systems. The Bll photometric redshifts, 
based on up to 19 band flux measurements in the survey 
core, feature an overall accuracy (measured against a spec- 
troscopic subsample from AGEIS with median z ~ 1.3) of 
Az/(1 + z) = 0.034 at a sub-2% catastrophic failure rate. At 
the highest redshifts (z > 2.5) comparison against a spectro- 
scopic sample of 91 Lyman-Break Galaxies (LBGs) confirms 
only a slight degradation to Az/(1 + z) = 0.069. The corre- 
sponding Bll stellar masses for these systems were derived 



using the PEGASE0 SED library (with Salpeter initial mass 
function and Calzetti extinction) — the choice of which (from 
amongst the wide range of alternative SED libraries) repre- 
sents the dominan t source of systemat ic uncertainty here (of 
order 0.1-0.3 dex; IBarro et alll2011bl ). For the purposes of 
this paper, in which our principle aim is to demonstrate as 
straightforwardly as possible the technicalities of the ABC 
approach, we hereafter neglect further quantitative consider- 
ation of these uncertainties (except when required for fitting 
the build up in number density over cosmic time, which con- 
tributes two "nuisance parameters" to our model, in Section 

ED}. 

The CANDELS team jGrogin et al.l l201ll ; 
iKoekemoer et all l201ll ) is currently engaged in the ac- 
quisition of high-resolution, near-infrared (and UV) 
imaging targeting distant galaxies in selected sub-regions 
of five key legacy fields (GOODS-N, GOODS-^f|, the EGS, 
COSMOEO, and the UD^H) totaling -800 arcmin 2 under an 
allocation of 902 orbits of HST/WFC3 exposure time. Driz- 
zled to a pixel scale of 0.06 arcsec, the presently-available 
epoch (egsOl) of imaging within (an —90 arcmin 2 sub-region 
of) the EGS field features a point spread function (PSF) 
full width half maximum (FWHM) of —0.18 arcsec and a 
5a detection limit of 26.8 mag in the F160W (f/-band) 
filter (with comparable coverage in the F125W filter). As 
such CANDELS already represents the highest quality 
dataset published to-date for the study of rest-frame optical 
morphologies at z ~ 2-3 in the EGS. Accordingly for the 
present analysis we derive our high redshift demographic 
benchmark from visual classification of all 126 members 
of the Bll catalogue at M ga i > 10 n Mo and 1.5 < z < 3 
imaged thus far. 

To this end one of us (EC) inspected each source care- 
fully in the CANDELS (HST WFC3/IR) #-band mosaic 
with ds9 and assigned it one of the following four type s: (1) 
spheroid (compact elliptical; cf. ISzomoru et al.ll2011bD . (11) 
spheroid- plus-disk (early-type di sk with a prominent central 
bulge; cf. ICameron et al"1l2011bh . ( hi) pure disk (late-type , 
bulgeless [perhaps clumpy] disk; cf . lElmegreen et al.ll2007ah . 
or (iv) ongoing merger (evident violent relaxation event in 
progress, as revealed by the presence of dis tinctive tidal fea- 
tures and /or multiple massive nuclei; cf. lElmegree n et all 
|2007bJ). In total we count 8 Type I spheroids, 9 Type 11 
spheroid-plus-disks, 90 Type ill pure disks, and 19 Type IV 
ongoing mergers in our sample. Example i/-band postage 
stamp images characterising these archetypal high redshift 
morphologies are presented in the lefthand panel of Figure 
EQ as is an illustration of the demographic evolution across 
our sample in the righthand panel. Once again in accordance 
with the expository aims of this paper regarding ABC we 
do not explore the (complex) possible impacts of classifica- 
tion subjectivity on our results — although we note that both 
ABC and the Bayesian framework in general offer a powerful 
statistical basis for marginalising over such sources of uncer- 

4 PEGASE: Projet d'Etude des G Alaxies par Synthese Evolutive 
ijFioc fc Rocca-V olmcrangc 199?]). 

5 GOODS-N/-S: Great Observatories Origins Deep -North/- 
South. 

6 COSMOS: COSMic evOlution Survey. 

7 UDS: UKIDSS (UKIRT [United Kingdom InfraRed Telescope] 
Infrared Deep Sky Survey) ultra-Deep Survey. 
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Figure 1. (Left:) CANDELS (HST WFC3/IR) H-band postage 
stamp images characterising the four archetypal morphologies 
present amongst massive (Af ga l > 10 11 Mq), high redshift (1.5 < 
2 < 3) galaxies in the Bll dataset. (Right:) An illustration of the 
evolving morphological mix amongst our Bll (CANDELS/EGS) 
sample. 



tainty (cf. iGelman et alj|2003t E.N. Taylor, in prep.; and see 
our treatment of various "nuisance parameters" in Sections 
I3.ll and l3.2p . particularly where the experimental evaluation 
of the classificatio n system ha s been appropriately designed 
and implemented l|Handll997l ). Reassuringly though, the rel- 
ative proportions of early and late type systems recovered 
from our classification proc ess are at least broadl y consis- 
tent with those reported by iBuitrago et al.l (|201 lh in their 
analys is of the (lower resoluti on) GOODS NICMOS Su rvey 
(GNS. IConselice et a.l.|[20lTbl ; also iMortlock et al.ll201ll ). 



3 STATISTICAL METHODOLOGY & RESULTS 

Here we begin by introducing our stochastic model for the 
morphological transformation of high redshift galaxies, de- 
tailing both the tractable "independent evolution" case and 
the intractable "co-evolution" case, in Section [3TTJ We then 
proceed to outline the SMC approach to ABC in Section[3]2] 
and to demonstrate linear regression-based construction of 
an optimal summary statistic-discrepancy parameter pair- 
ing for our model in Section 13.31 Finally, in Section 13.41 we 
compare the performance of SMC ABC against "tractable 
likelihood" -based MCMC in the "independent evolution" 
case and present our ABC-only solution for the more re- 
alistic "co-evolution" case. 



3.1 Morphological Transformation as a Stochastic 
Process 

With the current generation of SAMs yet to offer detailed 
or reliab le predictions for the morphologies of si mulated 
galaxies ([Almeida et al.ll2007l ; iGonzalez etaii r2009) we de- 
velop here instead a basic stochastic model for describing 
the competing processes of morphological transformation in 
the early Universe. In this endeavour we are motivated both 
by contemporary observational results and hydrodynamical 
simulations. The purpose of simulation in this study is thus 
not to work forwards through parameterised approximations 
for the physical laws of halo accretion, gas cooling, and star 
formation (amongst others) in order to constrain their "fun- 
damental" scaling coefficients (as in SAMs), but rather to 
explore in a rigorous statistical sense the extent to which 
the rates of incidence of the key events thought to shape 



morphological evolution are jointly constrained by the ob- 
served demographics. Nevertheless, working backwards from 
these constraints (our posterior probability densities) one 
may hopefully achieve insight into the underlying physical 
mechanisms, as we discuss in Section [4] 

As a starting point for our model we suppose that 
the arrival of galaxies at the top end of the high redshift 
stellar mass function may be faithfully represented as a 
non-homogeneous Poisson birth process with an underly- 
ing rate, \b(t), increasing as 10 K t J from a zero baseline 
at z = 6. Thus, on average (i.e., over an infinite volume), 

$>io"m (z) = Jo" ' Xb(t)dt = 10 K ^j- (modulo the im- 
pact of merging amongst M ga i > 10 Mq systems, which 
is intrinsica lly rare at these redshifts, i.e., negligible in this 
context; cf. iMan et all |2012| and our discussion in Section 
14.11) . In the lefthand panel of Figure [5] we illustrate the 
build up in number density at M ga i > 10 Mq over the 
interv al z ~ 1.5 to 6 synth esised from observations in the 
GN S (IMortlock et al.ll201ll . Mil), the MOIRC^Deep Sur- 
vey jKaiisawa et al.ll2009l. K09) the NEW FIRMj Medium- 
Band Survey 1 Marchesini et ail 201dl. M10; [b rammer et al.l 
|2011| . BR11), the UPS (|Caputiet alll201ll . Cll), and the 
EGS (the present study, C12). Interestingly, where their red- 
shift baselines overlap a number of these rival $>ioiia/q( 2 ) 
determinations exhibit surprisingly large discrepancies, even 
with regard to their respective cosmic variance uncertainties 
( marked as la error bars in Figure [2] follo wing the recipe 
of iMoster et al.1 l201ll ). As highlighted by iBrammer et al.1 
|201ll ) such discrepancies may well arise from the systematic 
errors inherent in SED-based stellar mass computation (ow- 
ing to degeneracies between the various template libraries), 
and we suspect this to be the case here. 

To estimate our birth rate parameters, K and 7, we 
thus perform standard MCMC exploration of the relevant 
posterior probability density space under a likelihood model 
in which the datapoints from each of the above-listed stud- 
ies are assumed subject to a common systematic bias in 
addition to cosmic variance. The prior magnitude of this 
systematic bias component (in dex) is treated as normally- 
distributed with A/"(0,0.1 2 ) for each survey. Our respec- 
tive priors on K and 7 are both Uniform, with the for- 
mer uninformative and the latter standard (i.e., bound be- 
tween zero and one). The joint posterior density for {K,y} 
thus recovered is roughly bivariate Normal with fx,-, ~ 
A/Trunc. ([-4.1, 0.65]', [0.06 2 , 0.1 2 ; p = 0.05]; < 7 < 1). For 
reference we plot the corresponding (pointwise) median, la, 
and 3a credible intervals for $>ioiimq( 2 ) against the var- 
ious empirical determinations shown in Figure [2] Due to 
the relatively small cosmic volume probed by the CAN- 
DELS/EGS dataset we do not attempt to further constrain 
K and 7 during our ABC analysis; instead we treat these two 
variables as "nuisance parameters" of our stochastic model 
and integrate them out at run-time (see Section f3 . 2 [) . 

It is important to note at this point that the marginal 
posterior density on the systematic bias in our EGS data- 
points favours a (median) of +0.11 dex, suggesting that the 
Bll stellar masses are systematically over-estimated by a 



8 MOIRCS: Multi-Object InfraRed Camera and Spectrograph. 

9 NEWFIRM: NOAO (National Optical Astronomical Observa- 
tory) Extremely Wide-Field InfraRed iMager. 
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Figure 2. (Left:) The build up in galaxy number density at stel- 
lar masses M ga i > 10 1:l Mq from z = 6 to 2 = 1.5 (in terms of 
cosmological time since z — 6) synthesised from recent observa- 
tional determinations of < I > >ioiimq ( z ) fr° m the literature. The 
K09, M10, Cll, and Mil datapoints shown here are derived from 
integration of their published Schechter mass function fits, while 
the BR11 datapoints are sourced directly from that paper. The 
error bar on each indicates the lcr contribution of cosmic vari- 
a nce for the respectiv e survey and bin width (following the recipe 
of lMoster et al.ll201lh ■ The dark, medium, and light grey curves 
plotted underneath represent our (pointwise) median, lcr, and 
3cr credible intervals, respectively, on the evolving mean number 
density given the posteriors of the fitted K and 7 for our non- 
homogeneous Poisson birth process model, A(,(t) = lO^t 7 . The 
median curve corresponds roughly to 7 ~ 0.65. (Right:) Illustra- 
tion of the birth rate by type in our model (cf. Section 13.111 . The 
rate at which sub-10 11 MQ galaxies are promoted above this mass 
threshold by merging is taken as W times our M ga i > 10 Mq 
merger rate, W\ m (t)\ leaving the rate of promotion (by star for- 
mation) of Type III disks, A t ;(t), as the remainder with respect 
the total birth rate, Aj,(t). 



correspondi ng ~0.10 dex (adopti ng the z ~ 1.5 mass func- 
tion slope of lMortlock et al1l201ll ). Hence, it is perhaps more 
appropriate to describe our Bll CANDELS/EGS dataset 
as an M ga i S 10 11 Mq sample, stressing the inherent uncer- 
tainty in SED-based stellar mass selection. 

We next suppose that each galaxy arrives at the top 
end of the high redshift stellar mass function as either a 
(star-forming) late-type disk (Type ill) or an ongoing major 
merger (Type iv) — a simplifying assumption which serves to 
reduce markedly the required dimensionality of our model, 
yet which is also consistent with the present state of knowl- 
edge on this topic. In a recent empirical census of rest-frame 
optical morph ology amongst the sub- 10 11 Mq population at 
1.5 < z < 3.5 ICameron et all (|2011bl ) were unable to iden- 
tify a single unambiguous spheroid b eyond z re 2.2 in their 
sample from the ERS (and see also IConselice et al.l l2011al 
for a similar result). Amongst the small fraction (~20%) of 
sub-lO^Mg spheroids discovered in their sample at later 
epochs only one was found to be actively star-forming — 
leaving dry merging as perhaps the only feasi ble (but also 
unlikely, cf. iLin et al |2010| ; IChou et al.l l201ll ) mechanism 
for sub-lO^Mg spheroids to thus move above this thresh- 
old mass without transitioning through a standard Type 
IV phase. Meanwhile, contemporary hydrodynamical simu- 
lations have demonstrated the theoretical potential for high 
redshift disks at 10 10 ' 5 -10 n MQ to sustain imm ense rates of 
star formation fueled by c o ld flow gas accretio n |Dekel et all 
120091 ; iBrooks et all 120091 : iGenel et all 1201(f ) while avoid- 
ing secular bulge assembly through the wind-driven disrup- 
tion of clump instabilities (|Genel et al.ll201ol ; Irlopkins et all 



l201lj ) — ensuring their rapid transition to the high mass 
regime intact as Type in systems. 

The probability of birth as a Type IV ongoing merger 
is estimated in our model as W times the ratio of the in- 
stantaneous merger rate amongst our Af ga i > W 11 Mq popu- 
lation, A m (tbirth)j to the corresponding instantaneous birth 
rate, Af,(£birth), as shown in the righthand panel of Figure 
[3 This factor, W, represents another nuisance parameter of 
our model corresponding to the ratio by number density of 
galaxies in such a mass range that a single major merger 
could promote them above 10 11 Mq to those already beyond 
this threshold. (There is an implicit assumption here that 
the merger rate does not evolve significantly with mass over 
this small baseline.) According to th e shape of the stellar 
mass function at the se redshifts (e.g. iBrammer et al.ll201ll ; 
iMortlock et alj|201ll ) we estimate W re 0.5 ± 0.2. Important 
to note is the fact that the merger rate so defined, A m (t), is 
strictly that of galaxies already at the top end of the stellar 
mass function — i.e., we are effectively eliminating the con- 
tribution to the observed merger fraction from galaxies that 
were sub-10 1 Mq prior to their most recent encounter. With 
Xb(t) the total birth rate and WX m (t) the birth rate of Type 
IV mergers the corresponding birth rate of Type III disks, 
Xd(t), is simply the arithmetic difference, Xb{t) — W\ m {i) 
(as indicated in the righthand panel of Figure [2]). 

As for the total birth rate described earlier, Af,(t), merg- 
ing in our model is characterised as a non-homogeneous Pois- 
son process, with a unique rate by volume of 



a m erge/3n 



A m (t) 



•■•merge /-'merge 



(t-t br )a„ 



for s£ t s£ tbr, 
1) for t br < t sS £1.5. 



*1.5— £ br 



Here Q morgc represents the baseline merger rate (in units 
of Mpc^Gyr" 1 ) at our lowest redshift, z = 1.5, with 
cVmerge Emerge the peak cosmological merger rate for massive 
galaxies at tbr- This latter model parameter, tbr, thus dic- 
tates a point of phase transition (or "break" ) beyond which 
the merger rate by volume must ultimately decrease (with 
increasing redshift) back to zero at z = 6 (the time origin 
of our model) at least as fast as the total number density 
of galaxies itself, lest the specific merger rate (per galaxy), 
A^ff ' become asymptotically infinite. Here we have chosen 
for simplicity a fixed, marginally sufficient decay rate for 
\ m (t) above this transition redshift of f 2 >7(«o.65)+i 0nc 
possible extension of our model, which may well be worth- 
while in the future if/when a larger demographic dataset for 
z ~ 1.5 CANDELS sources becomes available, would be to 
treat this pre-tbr decay rate as a free parameter of the fit. 

Previous empirical studies have argued alternately that 
the m assive galaxy merger rate is either very near con- 
stant Jde Ravel et al.l|201il : I Williams et~ai]|201li : lLotz et all 



120111; iMan et al.l 120121)" or mar k edly increasing with redshift 
dConselice et all I2003L l2009al: lLopez-Saniuan et all 120091 : 



Ide Ravel et al.ll2009l ; iBluck et al.ll2009l . l201ll ) from the local 
volume out to z ~ 1.5; and even less conse nsus exists regard- 
ing its behaviour to z ~ 3 and beyond (IBluck et al. l201ll ; 
IMan et al.ll2012l ; lLaw et al.ll2012l ; I Williams et al.ll20lj ). The 

intrinsic dumpiness of galaxy-scale star-formation at the 
observed optical (i.e., rest-frame UV) wavelengths of many 
of these studies (most non-WFC3) has proved a persistent 
source of uncertainty, introducing substantial ambiguity into 
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the interpretation of those morphological si gnatures other- 
wise i ndicative of recent merging locally (cf. IConselice et alj 
120031 : lLopez-Saniuan et alj|2009f ). Uncertainties concerning 
the fraction of a pparent close pairs to ultimately merge 
l|Lotz et al.l l2008h and the visibilit y timescales of t he re- 
sulting post-merger tidal features (|Lotz et al.l |2010| ) have 
only compounded these difficulties. (Important to note is 
that the full demographic analysis performed in the present 
study permits a simultaneous, non-degenerate constraint of 
the latter unknown, which is a significant advantage of this 
particular mode of analysis.) 

Our only inflexible constraints on the tuneable pa- 
rameters of X m (t) here are thus that a mergI! is, of course, 
strictly positive, /3 mC rge is greater than or equal to one, and 
te(= 0) < ibr 5= ti.s. Hence we adopt only weak priors spec- 
ified as: (1) a T distribution in logio-space for a me r g e [with 
H = —4, £ = 0.5, and 10 degrees of freedom, truncated 
to a key region of interest at a lower bound of —5.5 and 
an upper bound of —2.5]; (11) a Beta distribution in logio- 
space for 2/Smerge [with shape coefficients, 1 and 4, favouring 
a smaller peak-to-baseline ratio over a higher one]; and (ill) 
a Beta distribution for t D r (as a fraction of ti.s) [with shape 
coefficients, 2 and 1, favouring a break closer to z ~ 1.5 than 
z ~ 6]. The grey-shaded tiles and histograms in Figures [4] 
[9l and [10] offer graphical representations of these prior den- 
sities. 

Since, as mentioned earlier, dry-merging appears to 
be remarkably uncommon in the early Universe — cf. the 
rapidly declining fracti o n of red-red pairs with increasing 
redsh ift jLin et a.ll2oTol ; IChou et al.ll201ll ; [Kampczvk et alj 
l201ll 'l and the overa ll paucity of passive g alaxies, in gen- 
eral, above z ~ 2 jBrammer et all l201ll ; IWhitaker et al.l 
l201ll ; IWuvts et al.l l201ll ) — we assume that all mergers to 
occur under our model are gas-rich and therefore gener- 
ate a distinc tive post-merger morphology with irre gular 
tidal features (|Elmegreen et al.ll2007bl ; lLotz et alj|20ich . We 
model the (observed _ff-band) visibility timescale of these 
features according to a Gamma distribution with scale, 
1 + 100ri rr morph, and shape coefficient, 100, for ri rr m0 r P h 
in Gyr; thereby allowing an ~0.1 Gyr interquartile spread 
to account for some intrinsic variation in the cold gas frac- 
tion (and thus th e merger-to-stable-remnant transition time; 
lLotz et alj|20ld ) across the galaxy population sampled. In- 
spired by con temporary hydro dynamical simulations of gas- 
rich mergers jLotz et alj|20ld ) we choose our prior density 
on ri rr morp h to favour timescales on the order of 0.2 to 0.7 
Gyr (though permitting, at much low er prior density, the 
possibility of even >1 Gyr timescales; IConselicd l2009bh by 
adopting a Beta distribution with shape coefficients, 3 and 
5, on this parameter divided by 1.5. Upon fading of these 
post-merger tidal features we suppose the final remnant may 
assume either a Type I (pure spheroid) or Type 11 (spheroid- 
plus-disk) morphology with the probability of the latter out- 
come a tuneable parameter, Ps p h+D remnant- Given the on- 
going debate within the hydrodynamical modelling commu- 
nity regarding the relative frequency of mergers conducive to 
disk reformation at th ese epochs fe.g. lRobertson et alJfeOOfj ; 
iBournaud et al.ll201ll ) we adopt a Beta distribution prior on 
-Psph+D remnant with shape coefficients, 1 and 3, favouring 
Type n production in less than one in every four mergers. 

The second pathway of morphological transformation 
permitted under our model is that of secular evolution 
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Figure 3. Schematic illustration of the five characteristic path- 
ways of high redshift morphological transformation permitted un- 
der the stochastic model described in this paper (shown over an 
arbitrary timeline of 1.5 Gyr from "birth" to "observation" with 
an assumed Type III birth class). Solid lines are used to mark the 
fundamental pathways necessary to reach a given evolutionary 
state, whereas dashed lines allow for a range of possible degen- 
erate evolutionary histories prior to the most recent merger (the 
details of which are inconsequential to the final state achieved). 
The secular evolution pathway to Type II status and the null evo- 
lution pathway to Type III status are the only branches for which 
one could not substitute Type IV as the birth class here. 



of Type ill disks to Type II, the theoretical mechanism 
proposed to drive this process being the inwards migra- 
tion of massive star-forming clum ps as encounte r ed in 
certain hydrodymical simulations dBournaud et alj l2007i ; 
lElmegreen et alj |200S| ; iDekel et all l200d ). We again model 
this process stochastically via a Gamma distribution with 
scale, 1 + 50r scc ev, and shape coefficient, 50, for t SC c cv in 
Gyr; thereby inducing an intrinsic spread in this variable 
at run-time intended to mimic the impact of natural diver- 
sity in the structure and kinematics of high redshift disks. 
Though inwards migration has been well publicised as the 
favoured hypothesis of the SINSpl team to explain the char- 
acteristic morphologies and SEDs of the clump population 
hosted amongst members of their pioneering z ~ 2 survey 
l|Forster-Schreiber et aIll201ll ; lGenzel et al.ll201lh . as noted 
earlier the most recent hydrodynamical simulations incor- 
porating the effects of wind-driven mass loss, at least in 
the 10 10 ' 5 -10 11 M o regime, indicate that a large fraction of 
these clumps may b e too short-lived to migrate successfull y 
into a central bulge (|Genel et alj|20ld : iHopkins et alj|201ll ). 
We therefore adopt such a prior on the timescale for sec- 
ular bulge formation as to allow a full range of scenarios 
from rapid growth on sub-Gyr timescales (implying that 
many of our Type ill disks should transition to Type II be- 
fore z ~ 1.5) to incredibly slow growth on up to 10 Gyr 
timescales (implying none should). Mathematically we rep- 
resent our prior density on this parameter via a Uniform 
distribution in logio-space bounded between -1 and 1. 

Figure [3] illustrates schematically the five distinct path- 
ways of morphological transformation permitted under the 
above-specified model. We note for reference that Figures 3] 
[9] and [TTJ offer graphical representations of the prior densi- 
ties on all our model parameters. 



10 SINS: Spectroscopic Imaging survey in the Near-infrared with 
SINFONI [Spectrograph for INtegral Field Observations in the 
Near Infrared]. 
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3.1.1 Simulation from our Stochastic Model 

Having outlined above the principle details of our stochas- 
tic model for high redshift morphological transformation 
we now describe our corresponding procedure for simu- 
lating from this model under two distinct paradigms — 
"independent evolution" and "co-evolution" — the likelihood 
of the observed data under a given set of model parameters 
being tractable in the former and intractable in the latter. 
[Our derivation of the "independent evolution" likelihood 
function is given in the Appendix to this paper.] 



3.1.1.1 The "Independent Evolution" Case As the 

name suggests in the "independent evolution" case we sup- 
pose that neither the birth nor morphological transforma- 
tion history of any galaxy are ever coupled to those of an- 
other. Simulation from our stochastic model under this as- 
sumption for a given set of input parameters is then simply 
a matter of applying the above probabilistic transition rules 
to generate one-by-one a mock morphology at the observed 
redshift for each object in our benchmark sample as follows. 

First, the birth time of the galaxy at hand (i.e., the 
epoch at which its stellar mass finally exceeds 10 11 Mq) 
is drawn from the interval t§(= 0) ^ tbirth =S tabs ac- 
cording to the waiting time distribution dictated by its as- 
sumed (increasing-rate, non-homogeneous) Poissonian form 
(as derived in the Appendix to this paper). The birth class 
is then assigned as either Type ill or Type IV, with the 



WX m (t birtb ) 

^b(*birth) 



probability of the latter given by 

this ratio we must also sample a value for each of the 
nuisance parameters, K, 7 and W, according to /r-, 7 ~ 
A" Tr unc.(h4.1,0.65] , ,[0.06 2 ,0.1 2 ;p = 0.05]; < 7 < 1) 
and fw ~ A/Trunc.G" = 0.5, o~ = 0.2; W > 0) respectively 
(where A/rrunc. represents the truncated Normal distribu- 
tion). The number of mergers, n morgc , experienced between 
birth and observation is then drawn from the Poisson distri- 
bution specified by rate, = f' obs "V"/*? dt. If n mcrgc ^ 

''birth b V'' ) 

the corresponding epoch of last major merger is identi- 
fied by sampling from the relevant waiting time distribu- 
tion (also derived in the Appendix). The manifest dura- 
tion of the resulting post-merger (Type iv) irregular state 
is then drawn directly from the Gamma distribution with 
scale, 1 + 100ri rr m0 rph, and shape coefficient, 100; and if en- 
compassed within the remaining time until observation the 
galaxy is assigned either Type I or Type n morphology, with 
the probability of the latter set by Ps p h+D remnant (otherwise 
it finishes the simulation as a Type iv). Finally, galaxies 
born as Type ill disks and experiencing no major mergers 
may yet evolve to Type 11 via secular evolution, determined 
likewise by comparing an evolutionary period drawn from 
the Gamma distribution with scale, 1 + 50r scc ev , and shape 
coefficient, 50, against the time available between birth and 
observation. 

Simulation from our model is thus inherently 
stochastic — i.e., the internal assignment of birth times, 
most recent merger times, and so on (and thereby the 
output assignment of final morphologies) will vary from 
run to run at fixed input. In the SM C approach to ABC 
dChopinl 120021: iDel Moral et all l200fj ; ISisson et al] 120071 ; 
iDrovandi &c Pettittl I2010T I the effects of this stochasticity 
are accounted for in an efficient, consistent manner through 



the iterative application of the key rejection and resam- 
pling/refreshment steps described in Section f3. 21 

As noted earlier a characteristic feature of our model 
in the "independent evolution" case is that the likelihood, 
P(y\G), of the observed data under a given set of in- 
put parameters is, in fact, tractable (whereas in the "co- 
evolution" case it is not). For expository purposes this al- 
lows us to reconstruct via standard ("tractable likelihood" - 
based) Bayesian computational methods (namely, MCMC) 
the "true" posterior probability density of our model param- 
eters (modulo the inherent variance of MCMC simulation) 
as a benchmark for comparison against our ABC results. 
The derivation of this likelihood function is rather involved 
so we present details separately in the Appendix (along with 
a description of the MCMC scheme employed). The result- 
ing "true" posteriors are, however, presented here in Figure 
[4] for reference during our discussion of summary statistics 
in Section 13.31 and the accuracy of our ABC posteriors in 
Section 13.41 



3.1.1.2 The "Co-Evolution" Case Though it ensures 
the likelihood tractability required for our demonstration of 
the robustness of SMC ABC with respect to MCMC in Sec- 
tion below our initial assumption that galaxy evolution 
proceeds independently across our entire sample may not 
be physically realistic. A number of recent studies probing 
high redshift c l usters and proto-clusters o ut to z ~ 1.5-3 
JPohertv et~ai1 l2010l; lHatch et alj l201ll; " 
120111; ISpitler et al.l l201ll; iTanaka et all ' 



. Papovich et alj 
20 111 ) and beyond 



To compute ( Capak et al.l l201ll : ICarilli et al.l 1201 j ) have presented 



Si 



evidence to suggest that the evolutionary histories of 
intermediate mass galaxies within these early over-densities 
are in fact highly correlated such that they consistently 
achieve peak star formation earlier than their counterparts 
of similar mass in the field. Indeed such early biasing by en- 
vironment of star formation and mass accretion constitutes 
a fundamental prediction of hierarc h ical formation theor 
under ACDM (|Springel et all 120051 ; lOverzier et~ai1 1200 
and should already be manifest in the spatial distribution of 
the fi rst generation of (re-)ionising sources l|Kramer et alj 
2006). As usual though, empirical results for galaxies at the 
top end of the stellar mass function remain limited due to 
the intrinsic rarity of these systems. 

The beauty of ABC, of course, is that it permits the 
study of arbitrarily complex stochastic models irrespective 
of likelihood tractability, allowing one to relax such sim- 
plifying assumptions as that of "independent evolution" in 
the present example. In this Section we thus outline a "co- 
evolution" case of our model in which a physically plausible 
coupling is introduced into the formation times of galaxies 
in close pairs and small groups, and in Section ^. 4l we explore 
the impact of this coupling on our ABC posteriors. 

In Figure [S] we illustrate the nature of spatial cluster- 
ing amongst the massive (M ga i > W 11 Mq) galaxies of our 
Bll (CANDELS/EGS) sample at 1.5 <z < 3, representing 
their 3D distribution in observed right ascension (RA), dec- 
lination (DEC), and redshift via 2D projections in comov- 
ing distance along the line of sight (LOS) and the axes of 
RA and DEC, alternately. Neighbouring systems separated 
by no more than 2.5 Mpc — a conservative linking scale for 
proto-group-sized over-densities in the early Universe (cf. 
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Figure 4. Benchmark posterior probability densities for the key parameters of our stochastic model of morphological transformation at 
high redshift (in the "independent evolution" case) recovered from "tractable likelihood" -based MCMC simulation. In each of the main 
diagonal panels we compare the marginal posterior density of a single parameter (in blue) against its prior (in grey), while in each of the 
off-diagonal panels below we extend this comparison to the joint density formed by pairing that parameter against one of its peers. For 
the latter visualisation we employ a lattice of variable-sized points to trace the MCMC posterior on a scale of 1, 2.5, 7.5, and 15 times 
some appropriate baseline probability density, while grey-shaded tiles map the corresponding prior on an identical scale. In the upper 
right panel we plot the (pointwise) la and 3<r credible intervals and median curve (in dark grey, light grey, and blue respectively) for the 
Af ga i > 10 11 AfQ merger rate, A m (t), deriving from our (joint, marginal) posterior densities on a m ergc, Ancrge, and % r . 



ICapak et al.ll201ll and references therein) — are marked ac- 
cordingly and highlight the diversity of high redshift "envi- 
ronments" in the survey volume, with eleven simple pairings 
and one threesome identified at the adopted linking scale 
and a majority of relatively isolated systems. Perhaps the 
most striking feature on first inspection of this plot, however, 
is the apparent void at ~150 Mpc distance from the lower 
bound of our sample at z ~ 1.5. Examination of the Bll 
photometric redshifts for the full EGS, however, reveals no 
indication of this under-density extending across the wider 
field, rea ssuring us that it is indeed an imprint of cosmic vari - 
ance (cf. lTrenti fc Stiavellilliooi : iDriver fc RobothanfeOlfj ) 



within the small volume probed by the present dataset — 
and not the result of a systematic bias in the adopted SED 
fitting algorithm, for instance. 

To explore the impact of small scale clustering on the 
"birth" time distribution as defined in our stochastic model 
for morphological transfo rmation we refer to t he pu blicly- 
available mocks from the |Pe Lucia fc Blaizotl |2007l ) SAM 
embedded in a small volume of the Millennium Simulation^ 
of comparable size to that probed by the Bll datasetQ In 

11 URL( http://galaxy-catalogue.dur. ac.uk:8080/Millennium/ ). 

12 Recall that for the purposes of the present analysis we treat 
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Figure 5. The spatial distribution of massive (Af ga j > 10 11 Mq) 
galaxies in our Bll (CANDELS/EGS) sample at 1.5 < z < 3, pro- 
jected in terms of comoving distance onto the LOS-RA and LOS- 
DEC planes. The blue circles overplotted highlight our eleven 
pairs and one threesome of galaxies neighbouring within 2.5 Mpc. 
The colour/symbol code for galaxy types is identical to that of 
Figure [l] For reference we also mark the LOS axis on a scale of 
time since z = 6 (the origin point of our model). [See the text for 
a comment on the apparent void at ~150 Mpc] 



to the inverse square of distance between that point and 
the corresponding latent variable from the secondary set. 
The remaining points in the primary set are then randomly 
allocated with equal selection probability to the many iso- 
lated galaxies of our dataset. Each point on the interval 
[0,1] thus assigned is then transformed to a birth time for 
its matching galaxy through multiplication by the relevant 
Ai,(i bs) (see, for reference, the Appendix to this paper) fol- 
lowed by inversion to recover ibirth- We have confirmed by 
repeat simulation that this process of birth time sampling 
consistently produces median absolute birth time differences 
for both neighbouring galaxies and random pairings ofsim- 
ilar magnitude to those of the IDe Lucia fc Blaizod (|2007f i 
SAM described above. Note also that by allocating from an 
initial uniform sample in this manner we naturally preserve 
the mean build up rate of < J > >ioha/q( 2 ) specified by our fit 
of K and 7 in A(,(i) against the available observational data 
from K09, M10, BR11, Cll, Mil, and C12 (see the lefthand 
panel of Figure [2} . 



line with our suspicion from Section [3TT] of an ~0.1 dex bias 
in the Bll stellar masses our first discovery here is that 
the number density of simulated galaxies in these mocks 
at a cut-off of M ga i > 10 Mq is far below that of our 
CANDELS/EGS sample, but may be brought into reason- 
able agreement if we revise our selection down to (at least) 
Mg a i > 1O 1O,9 M0. Following this adjustment we identify 
12 pairs, 2 threesomes, 3 foursomes, and even a five mem- 
ber configuration in this ~ 7 x 10 5 Mpc 3 "snapshot" of the 
IDe Lucia fc Blaizod (|2007l ) SAM at z - 1.5. Interestingly, 
whilst we do find a strong correlation between the birth 
times of galaxies in close associations — with a median ab- 
solute difference of only ~0.3 Gyr for neighbours within 2.5 
Mpc compared against ~0.65 Gyr for randomly assigned 
pairs — despite some theoretical expectation there exists neg- 
ligible evidence in these mocks for a systematic bias in this 
specific aspect of galaxy formation. Hence, we do not at- 
tempt to induce one arbitrarily into our stochastic simu- 
lations; instead we focus here simply on reproducing the 
observed correlation using the following modified sampling 
scheme. 

Rather than drawing independent birth times one-by- 
one for each galaxy in our sample as in the "independent 
evolution" case described above, in the "co-evolution" case 
of our model we in fact generate a complete set of birth 
times at the start of the simulation and distribute these 
thereafter with an environmental dependence. We achieve 
this by drawing from the standard Uniform distribution a 
primary set of 126 points, one for each galaxy in the Bll 
CANDELS/EGS dataset, plus a secondary set of 12 points, 
one for each close pair or threesome earlier identified (see 
Figure [5} — the latter serving as "latent variables" for the 
coupling of birth times in these associations. To each galaxy 
in a close pair or threesome we then allocate a single point 
from the primary set with selection probability proportional 

the time of "birth" in our model as the epoch at which a galaxy 
first reaches the top end of the high redshift stellar mass function, 
wh ether via star format i on or merging. To identify this point in 
the IDe Lucia fc Blaizol l|2u07h mocks one must follow back the 
linked progenitor tree accordingly via SQL query. 



3.2 SMC ABC: Sequential Monte Carlo 
Approximate Bayesian Computation 



As mentioned in the Introduction, Approximate Bayesian 



Computation (cf. iTavare et al.ll 1997 


: IPritchard et al.l 


199S 


Beaumont et al. 2002; Sisson et al.l 


20071; IWilkinsonl 


2008; 


Csillerv et al.l l20ld: iDrovandi & Pettitd l20ld. 12011) 


offers 



a rigorous statistical framework for estimating the posterior 
probability densities of key scientific parameters under com- 
plex models for which the likelihood of the observed data 
may be entirely intractable (thus prohibiting application 
of the standard MCMC approach, for example). To con- 
duct ABC one requires only a stochastic model from which 
the observed data, y, are believed to be a random draw 
given some unknown set of intrinsic (true) input parameters, 
and a discrepancy measure for the comparison of simulated 
data, y s , against observed, p(S(y), S(y s )), typically based 
on a set of low order sum mary statistics, S(-). Following 
IDrovandi fc Pettitd (|201Ch the aim of ABC may be stated 
formally as the recovery of unbiased samples from the distri- 
bution described by the approximate joint posterior density, 

my s \p(S(y),S(y s ))<e T )K {^f^ W 

XL p(S(y),S(y s ))^e T , 

where represents a vector of unknown model parameters, 
n(0) the prior density on those parameters, and et some 
target tolerance for the specified discrepancy measure be- 
tween simulated and observed data. The indicator func- 
tion lp(S( y ),s(y s ))sce T assumes value unity for simulated- 
observed dataset pairs with metric distance below this tol- 
erance, and zero otherwise. 

The archetypal scheme for random sampling from the 
dist ribution defined by E quation [T] is that of rejection ABC 
(cf. IPritchard et~aLK l999) in which one draws a large sam- 
ple of N trial parameter vectors, Oi (i = 1, . . . , N), from the 
prior, 7r(0), simulates a corresponding dataset for each, y s , 
and then rejects all Oi for which the discrepancy between 
simulated and observed data exceeds some target tolerance, 
i.e., p(S (y) , S (y s )) > &r- Unfortunately, except in (unreal- 
istically) fortuitous (or trivial) circumstances in which the 
prior is already very close to the posterior, when explor- 
ing high dimensional parameter spaces (N pal > 3) under 
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rejection ABC one will often face an undesirable trade-off 
between an impractically low acceptance rate or an uncom- 
fortably large tolerance — either way the approximation of 
the true posterior by the ABC posterior will likely be poor. 

As a more efficient alt ernative the Sequential 

200 it 



Monte Carl o (cf. Liu 
to ABC dDel Moral et al 



, [Chopin 
120061 : 



20021) ap proach 

.Sisson et all |2007| ; 
iDrovandi fc Pettittl l201Ch proposes instead to simulate 



from Equation [T] step-wise by evolving a dynamic pop- 
ulation of "particles" (with each particle representing 
a single vector of input model parameters) through a 
sequence of intermediate distributions characterised by 
/(y S |0)7r(0)l p (SCy),S(ya))<e i for t = l,...,T, indexing a 
series of non-increasing targets, e t . The two key stages 
of the SMC algorithm are thus: (1) rejection of the most 
discrepant particles under each target; followed by (n) 
resampling from amongst the least discrepant particles with 
some refreshment mechanism applied to maintain particle 
diversity. SMC may therefore also be referred to as "particle 
filtering" or "population Monte Carlo" (PMC; for examples 
of the rece nt adoption of PMC [SMC] in a cosmolo gical 
context see IWraith et af] 120091 . iKilbinger et~ai1 l20ld . and 
references therein). 

In this study we employ for rejection th e self-generating 
target strategy of lDrovandi fc Pettittl (|2010h in which the se- 
quence of incremental targets, et, is chosen on run-time such 
that a fixed fraction, a, of all particles are dropped at each 
iteration (herein a = 0.75). We then restore the particle 
population to its full operating size, TV, by resampling with 
replacement from amongst the remaining (1 — a)N parti- 
cles. Population refreshment is achieved by application of an 
MCM C kernel to these replicates. As in IDrovandi fc Pettittl 
(2010) we favour the use of a self-refining, Metropolis- 
Hastings proposal distribution based on the current particle 
sample mean vector, p., and covariance matrix, S. For the 
specific model at hand we adopt a truncated, multivariate 
T distribution of degree 10 with the truncation bounds set 
by the support of our prior densities (as detailed in Section 
ETTj) . At each iteration of the SMC algorithm this MCMC 
kernel is run a fixed number of times, R, to (hopefully) 
produce genuine refreshment in a fraction, c, of the resam- 
pled particles (with some particles, of course, likely to be 
moved multiple times). The requisite R is here estimated 
according to the empirical efficiency, p a cc, of the previous 
MCMC kernel as R = , iffiizg} . Note that the "likelihood 
ratio" in the corresponding MCMC acceptance computation 
is simply l P (S( y ),s(y s (e propoBcd )))s;e t , i.e., all trial particles for 
which the simulation produces a mock dataset with discrep- 
ancy distance falling below the current target are assured a 
non-zero probability of acceptance^ Our final target, et, 
is defined pragmatically (with respect to the limitations of 
our computational resources) as that e t for which a fur- 
ther SMC rejection-resampling step would incur more than 
i?max ~ 100 applications of this MCMC kernel. 

Treatment of Nuisance Parameters A particular 
feature of the model adopted in this study is the appear- 



ance of three nuisance parameters — essential inputs known 
only to a limited accuracy from previous studies, but un- 
able to be robustly constrained via the present observa- 
tional dataset. Namely, = {K, 7, W} where /k,7 ~ 
Ar Tr u„c.([-4.1,0.65]',[0.06 2 ,0.1 2 ;p = 0.05]; < 7 < 1) 
and f w ~ AfTruncOu = 0.5, a = 0.2; W > 0). Upon 
each simulation from our model for a single particle, 8i, 
we draw a random value for each of K, 7, and W from 
their respective distributions for use in that one instance. 
Heuristically this amounts to approximating the integral, 
We) fivA^ii ®)/(©)d@, by a single Monte Carlo sample, 
yet produces through the power of the SMC ABC algorithm 
(i.e., via inference over a particle population en masse) a rea- 
sonable approximation (converging asymptotically) to sam- 
pling from Equation [T] 



3.3 Refinement of Summary Statistics 

As explored in a number of recent papers careful selec- 
tion of the summary statistic(s) used to evaluate the dis- 
crepancy between simulated and observed data in ABC is 
of paramount importance to achieving accuracy and effi- 
ciency with this algorithm, wheth er employed for the pur- 
pose of parameter estim at ion (e.g. Joyce fc Marjoram! 200§l : 
iNunes fc Balding! I gOld ; IFearnhead fc Prangld I2012T) or 
more challengingh| 14 | . Ba y esian model choice (jRobert et alj 
1201 ll ; iMarin et alfBoill ; iBarnes et all 1201 ll ). Indeed the 
same is true for all approaches to inference from com- 
plex models for which sufficient statistics are unavailable , 
including those of "appr oximate likelihood" (|Woodl I2010T ) 
and model emulation (cf . iBower et al.l |2010| and references 
therein), and should thus not be thought of as a unique con- 
cern for ABC-based analyses. 

For some stochastic models (including that of the 
present case study in morphological transformation; cf. 
Section 13. 1|) the nature of the output data may well be 
such that only a few basic modes of summary are of any 
likely val ue, wh i le for others (e.g. the Ricker map case 
study of IWoodl l20ld ; and most SAM-based studies of 
galaxy formatiorF^l) there may in fact be many in a rich 
hierarchy of complexities — either way though it will rarely 
be profitable to carry more than one or two into a full ABC 
analysis, owing to the direct relationship between Monte 
Carlo error and summary dimen sion (cf. I Beaumont et al.l 
|2002| ; IFearnhead fc Prangld I2012T ). Hence in this Section 



13 When using a non-symmetric proposal distribution as in the 
present case the ratio of sampling densities joins, of course, the 
likelihood ratio and the prior density ratio in computing the full lation 
MCMC acceptance probability. lation 



14 Application of ABC to the problem of Ba yesian model choice 
fcf. lGrelaud et al.l2009l ; lToni fc Stumpfl20irj) is far from straight- 
forward as an unfortunate summary statistic selection can lead 
to disasterously in correct Bayes fact ors, even as y mpto tically 
llRobert et al.ll201lf) . Recently though, IMarin et ah! d201lT ) have 
made substantial progress in this field by establishing necessary 
and sufficient conditions on the validity of candidate summary 
statistics for this purpose. 

15 With the choice of summary statistic typically (re-)cast 
in terms of the choice of reference dataset in these as- 
tronomical studies — namely, whether to constrain, for in- 
stance, against the luminosit y function jBower et all |2010| ; 

iLu et al.ll201ll; ICirasuolo et aljkqiCh. the Tully-Fisher relation 
llvan den BoscblboOrilTonini et al.ll201llh the mass-metallicity re- 



pTpuroet^dTl20*09h . a nd /or the black hole— bulge mass re- 
Henriques et alj|2009lh 
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we complete our exposition of the ABC technique with a 
demonstration of two contemporary approaches to this par- 
ticular selection problem: first, we apply Nunes & Balding's 
(2010) two-stage procedure of distributional entropy and 
MRSSBr^l minimisation to identify the optimal choice from 
amongst a candidate set of four "naive" summary statistics 
for our morphological dataset; and s econd, we employ the 
so-cal led "semi-automatic" scheme of iFearnhead fc Prangld 
(2012) to build an alternative set of summary statistics 
optimised with respect to the recovery of posterior means, 
validating their performance in comparison against the 
former. 

Minimum Entropy/MRSSE-Based Selection The 

most natural mode of summary for population demographic 
data in the context of extra-galactic astronomy is, of course, 
by way of type counts or type frac tions in similar-sized bins 
of redshift (see lOesch et ail |2010| and iBuitrago et all l201ll 
for recent examples). Important considerations when com- 
piling such summary data for the purpose of ABC analysis 
are then the number and placement of bins to use and the 
weights one should assign to the type counts/fractions ob- 
served therein. As a "naive" first attempt at constraining 
the parameter space of our model in the "independent evo- 
lution" case we thus trial four alternative summary statistics 
based on progressively finer subdivisions of the Bll CAN- 
DELS/EGS dataset by redshift. Mathematically, we define 
this class of "naive" summary statistics, S, via the generic 
column vector 



,(fc) ,(fc) ,(fc) ,(/=),/ 
' •'I ' -'II ' •'III ' ^IV J ' 



with fj 1 ' the fraction of Type I galaxies in the first of k 
redshift bins subdividing equally the interval 1.5 < z < 3, 
/jj the fraction of Type n galaxies in the aforementioned 
bin, and so on. Adopting equal significance weights across 
all bins, we thus establish a complete discrepancy distance, 



(2) 



P (S(y), S(y s )) = VtSobs - S sim )'(S obs - S aim ). 

F ollowing the two-stage procedure of iNunes fc Balding! 
(2010) we begin the evaluation of our four "naive" candi- 
dates, S : k = {1, 3, 6, 12}, by computing the fourth-nearest 
neighbour entropM 17 ! of the posterior distribution resulting 
from simple rejection ABC under each — the goal here being 
to exploit the (approximate) inverse relationship between 
entropy and information in order to identify the most "in- 
formative" summary statistic with regard to inference of the 
model parameters at hand. In the present round of rejection 
ABC experiments we accept only the 100 least discrepant 
particles of an initial sample of 5,000 drawn from the prior, 
and we compute the associated entropy statistic for each 
according to the formula, 



16 MRSSE: Mean square Root Sum of Standard Errors. 

17 One may note an intriguing similarity between the use of 
fourth (or fifth) nearest neighbour-based estimators in both sta- 
tistical studies of distributional en tropy and in astron omical stud- 
ies of large-scale environment (cf. iBaldrv et al.l[2006 ) — though it 
is unlikely there exists an underlying significance to this beyond 
the desirable error properties of the n ~ 4-5 choice ( Singh et al. 
l2003h . 
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Figure 6. Evaluation of candidate summary statistics for model- 
data comparison across a range of binning schemes (k = 
1, 3, 6, 12) for both our "naive" S and optimised S* (the latter ex- 
plained later in this Section) via the twin diagno stics of (Left:) dis- 
tribu tional entropy and (Right:) (M)RSSE (cf. lNunes fc Baldinj 
2010). In each instance the marked datapoint reveals the median, 
and the error bars a corresponding 95% confidence interval, re- 
covered from six rounds of rejection ABC (i.e., selection of the 
100 least discrepant particles out of an initial 5,000 drawn from 
the prior density) . Note that as the posterior means of our model 
parameters under the "independent evolution" case studied here 
have already been well approximated via our earlier ("tractable 
likelihood" -based) MCMC simulation we have employed these di- 
rectly to estimate a pseudo-RSSE, rather than forming an MRSSE 
from repeated ABC runs ag ainst simulated datasets "close" to the 
real one as in the canonical Nunes & Balding (2010) procedure. 



H = log 



r JV par /2 



r(ATpar/2 + 1) 



V(4) + logn - 



A, : , 



(Singh ct al. 2003; with A par [= 6] representing the dimen- 
sion of our model parameter space, n\= 100] the num- 
ber of accepted particles, F(-) and ip(-) the "gamma" and 
"digamma" functions, respectively, and Ri t 4 the fourth near- 
est neighbour distance). There exists, of course, a certain 
subjectivity in the choice of scaling for each parameter in the 
computation of Ri,4,; one option would be to first standardise 
all parameters on the interval [0, 1], however, in this case we 
prefer instead to standardise against the diagonal matrix of 
our prior variances, V, such that \ \6i, 6j\ \ = « s f6' i V~ 1 0j. By 
repeating the rejection ABC process six times for each k one 
may estimate both the median entropy and matching 95% 
confidence interval (from the range) under that particular 
binning scheme. The results of this analysis are presented in 
the lefthand panel of Figure [6] 

Interestingly, although one might, at face value, expect 
a monotonic relationship of decreasing posterior distribu- 
tional entropy with increasing k on the basis that finer bin- 
ning should break any false degeneracies in the posteriors 
recovered from ABC runs with fewer bins (i.e., in some 
sense increase the information return) , this is not necessarily 
true in practice owing to the simultaneous increase in Monte 
Carlo error, as the present example demonstrates. Although 
we do observe a slight decrease in distributional entropy 
upon moving from one to three redshift bins the opposite is 
true for our six and twelve bin trials under this particular 
mode of summary (see the lefthand panel of Figure [6} . 

Following identification of t he en tropy-minimising S 
(here k = 3) INunes fc Balding! l|2010l) recommend a sec- 
ond (rather computationally expensive) round of evaluation 
against the formal optimality criterion of Mean square Root 
Sum of Squared Errors (MRSSE), to be estimated via a new 
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Figure 7. Comparison of the marginal posterior density for 
ctmerge in the "independent evolution" case of our model recov- 
ered from SMC ABC under two of our "naive" summary statistics, 
S : k — 3 and S : k = 12). In each case the dotted line represents 
the posterior after one rejection— resampling iteration (m = 1) of 
the SMC ABC algorithm, the solid datapoints the same after four 
iterations (m = 4), and the solid blue line the benchmark pos- 
terior from "tractable likelihood" -based MCMC. At the bottom 
of each panel we indicate also the position of the posterior mean 
for this parameter under each of our SMC ABC runs as well as 
from our MCMC benchmark (the latter highlighted with a blue 
diamond). 



series of rejection ABC analyses against simulated datasets 
constructed under parameter vectors revealed by the orig- 
inal ABC runs for the entropy-minimising S as (likely to 
be) "close" to those responsible for the observed dataset. 
Since the (marginal) posterior mean of each model parame- 
ter under the "independent evolution" case studied here has 
already been well approximated via our earlier ("tractable 
likelihood" -based) MCMC simulation (see Section |3.1|) we 
may take a shortcut to the truth (and vastly reduce our 
computational burden) by employing these directly to es- 
timate alternative pseudo-RSSE scores for each of our pre- 
vious rejection ABC runs. The results of this analysis are 
presented in the righthand panel of Figure [5] The relation- 
ship between pseudo-RSSE score and binning k observed 
here mirrors closely that exposed by our original entropy 
evaluation, validating S : k = 3 as the optimal choice from 
amongst our candidate set of "naive" summary statistics. 

The value of the above optimisation procedure may eas- 
ily be appreciated from inspection of Figure [7] in which we 
compare the marginal posterior for one of our key model 
parameters, a mC rgc, recovered from full SMC ABC analy- 
sis under S with k = 3 against that for k = 12. As in 
all SMC ABC runs reported herein we use a population of 
10,000 particles, iterated through an adaptive threshold de- 
fined by a rejection rate of a = 0.75, followed by MCMC 
kernel-based resampling with a goal refreshment rate of no 
less than c = 0.90. A total of four iterations were achieved 
under this scheme before the limits of our computational 
resources were reached (at R > -R max [= 100] required appli- 
cations of the MCMC kernel for such a level of refreshment). 
Although both (marginal) posteriors appear rather similar 
after only one iteration it is evident by the fourth (and fi- 
nal) iteration that the S : k = 3 statistic has produced a 
particle population tracing far more faithfully the density of 
our benchmark ("tractable likelihood" -based) MCMC sim- 
ulation than its S : k = 12 counterpart. 

The "Semi- Automatic" Schem e In an important 
contri bution to the ABC literature iFearnhead fc Prangld 
|2012l ) have recently demonstrated that the optimal sum- 



mary statistic for estimation of model parameters under 
quadratic loss (i.e., optimality with respect to the recovery 
of posterior means) is simply the conditional expectation 
function, E(0\y). As a direct consequence the authors were 
thereby able to propose and justify a regression-based algo- 
rithm for the direct construction of well-behaved summary 
statistics, allowing one (in principle) to by-pass the above 
process of searching through a "naive" set of often unsat- 
isfactory candidates. To implement their so-called "semi- 
automatic" scheme one must first generate a "reasonably 
large" sample of model parameter-simulated dataset pairs 
spanning a "relevant volume" of parameter space. In lower 
dimensional analyses one may simply draw this sample di- 
rectly from the prior, though the posterior density from a 
trial ABC run with some "naive" summary statistic will 
generally serve as a superior starting point. Least-squares- 
based fitting to this reference dataset of the relation, 6i = 



p£' +{3i'f(y) + ei, for each model parameter, 0t e 0, yields 
the optimal summary statistic, S* = (3o + /3i/(y)o Here 
et denotes a symmetric error term of zero mean and f(y) 
some vector-valued function of the data, which for y e R p 
will be typically of the form f(y) = y, f(y) = {y,y 2 }', or 
similar — the lack of a universally appropriate algorithm for 
defining this regression function being the chief cause for 
classification of the above scheme as "semi-" rather "fully" 
automatic. 

Since the raw output, y, from our stochastic model 
for high redshift morphological transformation is, in fact, 
multinomial (rather than real- valued) we adopt here (for the 
purposes of computational efficiency) a modified regression 
function of form, f(y) = S(y), with S(-) denoting as above 
the compilation of type fracti ons in fixed bins of r e dshift . 
Under this adaptation of the IFearnhead fc Prangld (|2012l ) 
approach the magnitude of each component in each fitted 
fii may be considered a weight for the importance of that 
type fraction and redshift bin in estimating the correspond- 
ing (i-th) model parameter. As in our earlier computation 
of fourth-nearest neighbour distances we employ our prior 
variance matrix, V, to establish the full discrepancy mea- 
sure under this new summary statistical. 



p(S*(y),S*(y s )) = (S* hB 



sij'v-^s, 



* 

obs 



Fitting of /3o and /3i was achieved here by application of 
the glm and step routines in R to the 100 least discrepant 
particles from each of the six rejection ABC runs conducted 
earlier under our "naive" summary statistic for k = 3 (re- 
sulting in a full calibration sample of 600 model parameter- 
simulated dataset pairs). Notably, the step routine in R 
makes use of the AIC (Akaike Information Criterion) statis- 
tic to restrict each fit to only those elements of S contribut- 
ing significantly to the prediction of 8i. 

Although k = 3 proved to be the optimal binning 
scheme for the set of "naive" summary statistics examined 
above one cannot simply assume this to hold for the new 
S* , so we again examine the merits of each alternative, 



18 In fact, as noted bv lFearnhead fc Pranelel j2012F) , since in ABC 
analysis we are only interested in the difference, S* hs — S*, , the 
vector, /3o, may well be omitted from this above definition. 

19 Another (well-motivated) alternative choice of scaling here 
would be the sample covariance matrix of the posterior particle 
population from our earlier run of SMC ABC under S : k — 3. 
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Figure 8. Comparison of the marginal posterior density for 
ctmerge in the "independent evolution" case of our model recov- 
ered from SMC ABC under the alternative summary statistics, 
"naive" S : k = 3 and optimised S* : k = 3. In each case the dot- 
ted line represents the posterior after one rejection-resampling 
iteration (m = 1) of the SMC ABC algorithm, the solid data- 
points the same after four iterations (m = 4), and the solid blue 
line the benchmark posterior from "tractable likelihood" -based 
MCMC. At the bottom of each panel we indicate also the posi- 
tion of the posterior mean for this parameter under each of our 
SMC ABC runs as well as from our MCMC benchmark (the latter 
highlighted with a blue diamond). 

k = {1,3,6, 12}, following iNunes fc Balding! l|2O10l) . The re- 
sults of this analysis are overlaid against our measurements 
for the "naive" S in Figure [6] Reassuringly, with the excep- 
tion of the most limited (fc = 1) binning scheme these new 
summary statistics significantly out-perform the old in the 
pseudo-RSSE criterion for which they are designed. Despite 
our caution k = 3 does again appear to consistute the best 
choice of binning, though fc = 6 is not far behind in accuracy 
and may also offer slightly lower entropy. In Figure[H]we com- 
pare the marginal posterior density recovered for the model 
parameter, a mcrgc , following a full run of SMC ABC under 
S* : fc = 3 against that obtained earlier under S : k = 3. 
Interestingly, though our "nai've" summary provides a vi- 
sually "closer" fit to the shape (especially the width, i.e., 
standard deviation) of the benchmark MCMC density for 
this parameter, the optimised statistic does outperform it 
with regard to the recovery of the posterior mean (which it 
manages within a small tolerance after only a single itera- 
tion of the SMC ABC algorithm). Hence, given both its ease 
of implementation and its demonstrated effectiveness in the 
present analysis we c an confidently recommend t he "semi- 
automatic" scheme of lFearnhead fc Prangld l|2012r i for sum- 
mary statistic refinement. 

A Note on One Alternative As mentioned in the in- 
troduction to this Section the subject of summary statistic 
selection for ABC analysis remains an active area of research 
in the statistical literature, hence it is worth reviewing 
here briefly another popular alternative we have neglected 
to demonstrate above for the sake of b revity. Namely, the 
"appr oximate sufficiency" algorithm of I Joyce fc Marjoram! 
(2008) for iteratively building a master summary statistic 
from the union of randomly trialled candidates, with each 
new addition accepted only if it offers an improvement in 
parameter inference exceeding some threshold. This algo- 
rithm may be of particular interest for SAM-based studies 
of galaxy formation given the wide variety of available ob- 
servational benchmarks from which summary statistics may 
be composed (see Footnote ll5[) . though it has been criticised 
for a dependence on the (random) order in which the candi- 



date statistics are tested at each application (i.e., the stated 
search procedure is far from exhaustive). Note also that al- 
thoug h our above demonstration of the INunes fc Baldingl 
1)20101 ) procedure is presented in terms of selecting a unique 
summary statistic from four evidently degenerate choices 
(i.e., fc = {1, 3, 6, 12}) an optimal union of summary statis- 
tics may also be identified via the minimum entropy/MRSSE 
criterion, though possibly at great computational expense if 
the original set of basis candidates is large and there are 
many permutations of interest. 

3.4 SMC ABC Posteriors for Our Stochastic 
Model of Morphological Transformation 

In Figure [9] we present posterior probability densities for 
the key parameters of our stochastic model of morphologi- 
cal transformation at high redshift in the "independent evo- 
lution" recovered from SMC ABC using our opti- 
mised summary statistic (cf. Section T3. 31 above). S* : fc = 3. 
The approximate solution shown here represents the state 
of a 10,000 particle population progressed through four 
rejection-resampling iterations with an a = 0.75 rejection 
rate and a c = 0.90 target refreshment rate (cf. Section 
13. 2|) . Comparison against our "tractable likelihood" -based 
MCMC benchmark (for this tractable case of our model) 
presented earlier in Figure [4] highlights the robustness of the 
ABC approach. That is, without reference to the explicit 
likelihood function of the system at hand this simple pro- 
cedure of strategic simulation and discrepancy thresholding 
has nevertheless produced a most satisfactory approxima- 
tion to the true posterior, capturing the key features of each 
marginal and bivariate joint density un der investigation. A s 
is typical of ABC analysis though (e.g. ICsillerv et al.ll2010l ) 
the credible intervals thus derived (see, in particular, those 
for the M ga i > 10 n M o merger rate, A m (i), shown in the 
top right panel) are noticeably broader than those of the 
benchmark "tractable likelihood" -based solution, owing to 
the two key sources of ( "extra- Monte Carlo") approxima- 
tion error at play here: (i) the non-zero tolerance for the 
simulated-observed dataset discrepancy required to achieve 
a workable acceptance rate; and (n) the inherent degeneracy 
of summary statistic match ing over full dataset matching (cf. 
iFearnhead fc Pranglei[2012l . for instance). 

In Figure [101 we present for comparison the SMC ABC 
posteriors recovered from the "co-evolution" case of our 
model for which the likelihood function is thoroughly in- 
tractable, and thus the standard toolbox of ("tractable 
likelihood" -based) MCMC simulation entirely is inaccessi- 
ble. As described in Section ^. li the only difference between 
our "co-evolution" and "independent evolution" simulations 
is that the birth times of galaxies in close associations are 
coupled in a manner believ ed "realistic" from analysis of the 
|Pe Lucia fc BlaizotJ (|2007f) SAM in the former but not in the 
latter. Hence it is perhaps unsurprising that the posteriors 
for this example are only subtley different from those pre- 
sented above, with a slight decrease in confidence regarding 
the true value of the merger rate (i.e., the joint, marginal 
density of a mC rgc, Emerge, and £t>r) and a slight increase in 
confidence regarding the merger visibility timescale (from 
7-irr morph * 0.63 ±° i° 8 [la] to t 1ti morph * Q.53±8i|). Nev- 
ertheless this demonstrated ability of ABC to handle models 
with intractable likelihoods, and thus to permit the deriva- 
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Figure 9. SMC ABC posterior probability densities for the key parameters of our stochastic model of morphological transformation at 
high redshift (in the "independent evolution" case). In each of the main diagonal panels we compare the marginal posterior density of a 
single parameter (in blue) against its prior (in grey), while in each of the off-diagonal panels below we extend this comparison to the joint 
density formed by pairing that parameter against one of its peers. For the latter visualisation we employ a lattice of variable-sized points 
to trace the SMC ABC posterior on a scale of 1, 2.5, 7.5, and 15 times some appropriate baseline probability density, while grey-shaded 
tiles map the corresponding prior on an identical scale. In the upper right panel we plot the (pointwise) la and 3cr credible intervals 
and median curve (in dark grey, light grey, and blue respectively) for the M ga i > 10 11 Mq merger rate, A m (t), deriving from our (joint, 
marginal) posterior densities on a me rge, Emerge, and for- 



tiori of robust Bayesian constraints from arbitrarily "real- 
istic" (i.e., complex) simulations, offers a wealth of possi- 
bilites for astronomical studies far beyond the present ex- 
ample which cannot be over-stated. 



4 ASTROPHYSICAL RESULTS & 
DISCUSSION 

Having completed our exposition of the ABC algorithm in 
Section [3] above we take the opportunity here to explore a 
number of interesting astrophysical results arising from our 



chosen case study in morphological transformation at high 
redshift. In Section 14.11 we compare our SMC ABC-based 
constraints on the evolving merger rate in the early Uni- 
verse against recent estimates from the literature based on 
simple close pair and asymmetric galaxy counts, highlight- 
ing the superior informative power of the former over the 
latter. Then in Section 14.21 we discuss our (posterior) pref- 
erence for merging over secular evolution as the dominant 
pathway to early bulge formation in the context of contem- 
porary hydrodynamical and "semi-empirical" simulations. 
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Figure 10. SMC ABC posterior probability densities for the key parameters of our stochastic model of morphological transformation 
at high redshift (in the "co-evolution" case). In each of the main diagonal panels we compare the marginal posterior density of a single 
parameter (in blue) against its prior (in grey), while in each of the off-diagonal panels below we extend this comparison to the joint 
density formed by pairing that parameter against one of its peers. For the latter visualisation we employ a lattice of variable-sized points 
to trace the SMC ABC posterior on a scale of 1, 2.5, 7.5, and 15 times some appropriate baseline probability density, while grey-shaded 
tiles map the corresponding prior on an identical scale. In the upper right panel we plot the (pointwise) la and 3cr credible intervals 
and median curve (in dark grey, light grey, and blue respectively) for the M ga i > 10 11 Mq merger rate, A m (t), deriving from our (joint, 
marginal) posterior densities on a me rge, Emerge, and t^. 



4.1 The Evolving Merger Rate at the Highest 
Redshifts 



As mentioned in the Introduction to this paper the recent 
installation of WFC3 on HST has at last made accessible at 
high resolution the rest-frame optical morphologies of mas- 
sive galaxies at the ep och of peak cosmi c star formation and 



Lilly et al l 1 19961 ; iMadau et ai]|l99rj ; 



AGN activity (z ~ 2; ^ ...... 

lOesch et all |2012| ; IWarren et alTll994l ) , opening a unique 
window into the structural assembly of this first 



T— 

tion of Hubble sequen c e analogues JCameron et al.l l2011bl ; 
IConselice et~aH l2011al ; [s zomoru et all l2011al )~ A particu- 



lar motivation for our present case study in morphological 
transformation was to highlight the potential of model-based 
demographic analysis as a means to exploit this wealth of 
new data. In this Section we thus explicitly demonstrate the 
advantages of such an approach (as implemented here via 
SMC ABC) over the standard "one type at a time" mode of 
study. 

Given the (expected) importance of merging as a 
driver of both star formation and morphological transfor- 
mation under the can onical hierarchical clustering paradigm 
(White & Rccs 197&|) it is perhaps no great surprise that a 
total of four separate teams have recently attempted con- 
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straint of the z > 1.5 (major) merger rate based on close 
pair and/or a symmetric ga l axy counts in deep near-IR imag- 
ing. Name l y. (Bluck et all <|2009l , l201ll ) in the GNS field, 
iMan et all |2012l ) in the C OSMOS fie l d (in t he sub-region of 
HST/NICMOS coverage). ILaw et all |2012l ) in a dedicated 
HS T/WFC3 study f rom C ycle 17 sampling multiple fields, 
and IWilliams et all (|201lh in the UDS field; the first two 
exploring the same M ga i > 10 regime as in our pa- 

per and the latter two probing much further down the mass 
function. 

Having identified their target population of impend- 
ing mergers (in the case of close pair selection) or recent 
merger remnants (in the case of asymmetric galaxy selec- 
tion) each of these teams has then proceeded to estimation 
of the early Universe merger rate in the following manner. 
First, they compute the merger fraction as the number of 
mergers detected divided by the total number of galaxies 
within the target redshift interval and mass range minus 
the expected proportion of false detections (arising, for in- 
stance, from chance alignments along the line-of-sight) 
Second, they adopt a third-party estimate of the merger 
visibility timescale based on N-body/hy drodynamical sim - 



ulations of galaxy-galaxy collisions; e.g., Man et all (1201 
adopt T c io 



0.4 + 0.2 Gyr from lLotz et al.1 



And third, they either estimate the comoving volume den- 
sity of their target galaxy popula tion directly from the total 
count in the survey volume (as in lMan et al.ll2012r ). or adopt 
again a (hopefully more robust, i.e., less prone to cosmic 
v ariance error) thir d-party estimate from the l iterature (as 
111 Igluck et af[|2009l who take & > \o 11 M Pt ( z ) from lDrorv et all 
120051 ). The merger rate by volume is then computed simply 
as / m x $/ r . 

In the lefthand panel of Figure [TJJ we compare the re- 
sulting estimates of \ m (t) from the clos e pair studies of 
iBluck et all (120091 ) and IMan et all |2012l ) (denoted BL09 
and MA12 here, respectively) against the credible inter- 
vals derived from our SMC ABC analysis of the Bll CAN- 
DELS/EGS morphological mix (in the "co-evolution" case 
of our model). We present the aforementioned pair count 
estimates — which are all well outside our model-based la 
credible interval — as upper bounds on the major merger 
rate, since we believe their use of a 1:4 mass ratio threshold 
may admit a significant number of "weak" accretion events 



20 Following IConselicd J2006I) , IBluck et all d2009l . l201ll ) advo- 
cate correction of this raw merger fraction, which they denote 
/ m , to a "galaxy merger fraction", denoted f gm , via the relation, 



Is n 



2/„ 
l + f P 



The motivation for this correction is to identify the 



"number of galaxies merging as opposed to the number of merg 
ers". However, we believe this to be a false move in context of 
their analysis of the most massive galaxies in the GNS, in which 
they identify impending mergers as those Af ga i > 10 11 Mq sys- 
tems host to a close companion (< 30 kpc) of similar brightness 
(i.e., within +1.5 mag in the observed //-band; corresponding to 
a lower bound on the mass ratio of ~1:4). Owing to the steepness 
of the galaxy stellar mass function at z > 1.5 the vast majority 
of such companions will almost certainly be sub-10 11 MQ — indeed 
in our sample we find j ust one close pair in which both are truly 
high mass galaxies, and Man et all : 21 1 1 2T) find just two. Hence, by 
"correcting" to f gm as above one is in fact estimating the merger 
rate experienced by both massive galaxies and the (ill-defined) 
population of less massive galaxies that will ultimately merge 
with them, which does not seem a particularly useful exercise. 



unlikely to generate a substantial change in morphology; e.g. 
iHopkins et al.l (|2009al lbi) favour instead a 1:3 ma ss ratio for 
the m ajor/minor distinction. Interestingly, the IMan et all 
l|2012i ) results are at least qualitatively consistent with our 
posterior inference for the location of the break epoch at 
ibr ~ 2.55 Gyr (since z = 6; i.e., z ~ 1.75) — though we note 
in any case that the plotted datapoints do carry rather larg e 
uncertainties (of -88% at the la level; iMari et al.ll2012l ). 
owing in particular to the contribution from cosmic vari- 
ance error (and neces sarily stellar mass estimation error; cf. 
iBrammer et alj|201ll and Section (3T) in the determination 
of $>i O iim (z). 

As a more faithful comparison we present in the right- 
hand panel of Figure [TT] the merger rate inferred from 
application of the f m x $/r formu l a to b oth the asym- 
metric galaxy counts of IBluck et al.l (|201ll ) (denoted here 
as BL11) and the raw Type IV counts of our Bll CAN- 
DELS/EGS dataset (deno ted C12). For th e former we adopt 
ta «s 0.6 ± 0.3 Gyr from IConselice et~ai1 l|2009al ) (see also 
lLotz et al]|2008l ) and for the latter our prior of Ti rr mOT ph * 
0.55 + 0.25 Gyr; and in both cases we employ our in-house 
estimate of < ! > >ioiia/q( 2 ), which has been calibrated against 
a large compilation of recent datapoints from the literature 
in a manner accounting for the presence of systematic bi- 
ases in the underlying SED-fit based stellar masses of each 
contributing survey (see Section 13.11 and Figure [2]). Given 
our definition of A m (t) as the rate of mergers experienced 
by those systems already in excess of 10 11 Mq at the stated 
epoch we must allow for the presence of galaxies only pro- 
moted above this mass threshold by the aforesaid accretion 
event by scaling back our raw count-based estimates accord- 
ing to the factor, x ^ w , with W the "nuisance parameter" 
introduced in Section r3.ll The error bars accompanying each 
datapoint in this righthand panel of Figure [TTJdenote the la 
credible intervals accounting for the relevant uncertainties 
in these estimates (including, of course, those on the es- 
timation of the population proportion from bino mial count 
data, o ften mis-handled in astronomical studies; cf. lCameronl 
l2011al ). 

Immediately evident from inspection of this righthand 
panel of Figure [TTJ is the reasonable agreement (well inside 
the la errors) between our Type IV count-based estimates 
and those based on the asymmetric galaxy counts from 
IBluck et af]|201ll . confirming a fair degree of equivalence be- 
tween the use of visual classification and non-parametric, 
quantitative indicators for the selection of high redshift 
mergers. Perhaps the most striking impression made by this 
comparison, however, is the marked offset between the me- 
dian curve of our SMC ABC constraint on X m (t) from full 
demographic analysis of the Bll (CANDELS/EGS) sample 
and those estimates based only upon the Type IV count at 
t < 2.5 Gyr (since z = 6; i.e., at z > 2.25). In particular, our 
median curve here favours a higher merger rate, effectively 
splitting the difference against the higher asymmetric galaxy 
count-based datapoint at t ~ 1.5 Gyr (since z = 6). The ex- 
planation for this offset lies, of course, in the fact that we fit 
our model not just against the merger fraction but rather the 
full morphological mix, meaning that the fitted merger rate 
must not only account for the number of ongoing mergers at 
a given epoch but also the observed population of evolved 
systems (i.e., ellipticals and bulge-dominated disks) which 
must have (or in the latter case, will very likely have; see 
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Figure 11. The evolving merger rate over the first ~3 Gyr of 
massive galaxy formation [z ~ 6 to z ~ 1.5). Here A m (t) rep- 
resents the rate by volume of major (viz. morphology-changing) 
mergers experienced by those systems already more massive than 
10 11 Mq prior to this accretion event. As in Figure fTTJl above we 
plot the (pointwise) ler and 3cr credible intervals and median 
curve deriving from our SMC ABC analysis of the Bll (CAN- 
DELS/EGS) morphological mix in dark grey, light grey, and blue, 
respectively. The close pair count-based estimates of BL09 and 
MA12 (with the former scaled down by a factor of 2 to remove 
their "correction" from f m to f grn ; cf . Footnote I20jl are over- 
laid in the lefthand panel; marked here as loose upper bounds 
since their threshold of a 1:4 mass-ratio for designation a s a "m a- 
jor" merger may be overly generous (e.g. iHopkins et al,l l2009a b 
favour 1:3 alternatively). As a more faithful comparison in the 
righthand panel we present asymmetric galaxy count-based esti- 
mates derived from BL11 (following application of our in- house 
calibrations for < ! > >ioiimq ( 2 ) an d see Section [XT] and Figure 
[2t . Also overlaid here are raw Type IV count-based estimates from 
our (C12) visual classifications, along with an estimate based on 
the observed Type I and Type II count in our lowest redshift bin; 
the arrow on the latter noting the fact that the marked position of 
this datapoint corresponds to the limiting case of all these mergers 
having occurred with only just enough time to allow fading of the 
characteristic post-merger irregular features prior to observation. 



4.2 The Dominant Role of Merging over Secular 
Evolution for Early Bulge Formation 

Another interesting feature of the ABC-based model con- 
straints derived in this paper concerns the relative dom- 
inance of merging over secular evolution as the favoured 
mechanism responsible for building up the first generation 
of massive bulges in early-type disks. In particular, the pos- 
terior probability density of our Ps P h+D remnant parameter 
favours production of a Type n system in f» 33 + 17% [lcr] 
of mergers at these high redshifts — where the gas-rich na- 
ture of the progenitors has previously been argued as con- 
ducive to disk survival and/or rapid reformation around a 
centr al spheroid on the basi s of hydrodynamical sim ulations 
(e.g. iRobertson et all 120061 ; IHopkins etall l2009al ; but see 
iBournaud et al.ll201ll regarding the difficulties of reproduc- 
ing such merger outcomes in models with a re alistically cold, 
turbule nt interstellar medium). Interestingly. IHopkins et alj 
(2009b) have also argued for Type I suppression in gas-rich 
mergers as a solution to the inconsistency between conven- 
tional SAMs and the observed bulge and disk demographics 
in the local Universe. The posterior density on the secu- 
lar evolution timescale for our model, r scc cv 2.8+fJ [1°"] 
Gyr, on the other hand, renders this rival pathway to bulge 
formation rather unlikely for z > 1.5 disks. Indeed given the 
above posterior means one can confirm via repeated simu- 
lation from our model that on average only ~ 7% of Type 
II systems detected at these redshifts will have been formed 
via secular evolutio n. This result is con siste nt with the re- 
cent o bservations of iGenel et al.1 (|2010) and IHopkins et all 
l|201ll ) from hydrodynamic simulations in which wind-driven 
feedback appears to destroy all but the most massive clumps 
in less time than required for their inwards migration under 
dynamical friction. 



Section T4.2I below) undergone a merger in their recent past. 
As an indication of the contribution of Type I and Type n 
countf|3 to our fit of \ m (t) we have also marked in Figure fTTI 
a pseudo-A m datapoint estimated by treating these evolved 
galaxy types as ongoing mergers observed at tobs — Ti rr morph. 
The pseudo-A m datapoint thus computed confirms that the 
past rate of merging was very likely to have been higher 
than that indicated by our raw Type IV counts. Our counts 
of Type I and n galaxies also contribute a valuable source 
of data for constraining ri rr m0 rph, which through our ABC 
analysis we verify is indeed probably close to our prior ex- 
pectation (i.e., our prior mean of ri rr m0 r P h *** 0.55 + 0.25 
falls well within the la credible interval of our posterior, 
Tirr morph & 0.53+q;iJ). The associated increase in confi- 
dence regarding the true value of rirr morph contributes sig- 
nificantly to the reduced width of our SMC ABC credible 
intervals on \ m (t) (based on the full demographics) relative 
to those based only on our Type IV counts. 



In fact, we downweight the count of Type II galaxies in this 
calculation by (1 — fs p h+D remnant) to acknowledge the (minor) 
role of secular evolution in early bulge formation, as discussed 
further in Section 14.21 below. 



5 CONCLUSIONS 

In this paper we have demonstrated the potential of "Ap- 
proximate Bayesian Computation" for astronomical model 
analysis through a detailed case study in the morphologi- 
cal transformation of high redshift galaxies. In the process 
we have derived tight constraints on the evolving merger 
rate in the early Universe, and revealed the relative dom- 
inance of merging over secular evolution for bulge forma- 
tion at these epochs, through an ABC-based examination 
of the full population demographics of an M ga i > 10 11 Mq, 
1.5 < z < 3 sample from the Bll CANDELS/EGS dataset. 
More importantly though our exposition of the contempo- 
rary "Sequential Monte Carlo" implementation of ABC as 
well as two modern approaches to summary statistic selec- 
tion will hopefully guide and inspire further astronomical 
applications of this powerful statistical technique. 
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APPENDIX A: LIKELIHOOD COMPUTATION 
FOR OUR STOCHASTIC MODEL OF 
MORPHOLOGICAL TRANSFORMATION IN 
THE "INDEPENDENT EVOLUTION" CASE 

In this Appendix we derive the likelihood function, P(y\0), 
of the observed data— i.e., the (HST WFC3/IR) #-band de- 
mographics for our sample of 126 galaxies at 1.5 < z < 3 
and Mgai > lO n M selected from the Bll CANDELS/EGS 
dataset — given a particular set of input parameters, = 

{f^merge, /Emerge, tbr, -Psph + D remnant, 7~scc cv , 7"Irr morph } , Under 

the "independent evolution" case of our stochastic model 
for high redshift morphological transformation. To this end 
we must first compute generic expressions for the probabil- 
ity densities of the time of birth and the time of last major 
merger given an arbitrary redshift of observation. Integra- 
tion of the conditional transition probabilites of the per- 
mitted evolutionary pathways to a given morphology as a 
function of the above returns the likelihood of observing 
that type at a particular redshift, and via the independence 
assumption allows a simple formulation of the complete like- 
lihood function. 

Probability Density for the Time of Birth The 
"birth" of galaxies in our model (i.e., the promotion, via 
star formation or merging, of new systems to the top end of 
the 1.5 < z < 6 stellar mass function) is characterised as a 
non-homogeneous Poisson process of rate, 



\b{t) 

in units of Mpc _3 Gyr _1 with the origin of the time variable 
set to z = 6. Both K and 7 are treated here as nuisance pa- 
rameters with f K ,~t ~ A/rw. ([-4.1, 0.65]', [0.06 2 ,0.1 2 ;p = 
0.05]; < 7 < 1). The waiting time distribution for galaxy 
births under the above-specified Poisson proces^ffl is, of 
course, exponential in Af,-space, where 



f 

Jo 



A 6 (t) = X b (t)dt = 



K.7+1 



10 A t 7 

7 - 



Given that each galaxy experiences (by definition) only a 
single birth the At-epoch of this event represents a unique 
draw from the Uniform distribution on [0, A(,(t bs)] with t hs 
the cosmological time since 2 = 6 at the observed redshift, 
Zi. Transformation of variables back to the time domain 
specifies a probability density for the time of birth, tbirth, 
on [0, t obs ] of 

7 + 1 



/b(ibirth)rfibirth 



7 + 1 ^birth^birth- 



Probability Density for the Time of the Last Ma- 
jor Merger As for the case of the birth function exam- 
ined above, merging is treated under our stochastic model 
as a non-homogeneous Poisson process, with a variable rate 
by volume (in Mpc~ 3 Gyr _1 ) set by the input parameters, 

A 

Xm(t) = ' 



merge; /Vmerge; &HCi £br? of 



0B 



merge Emerge 
(t-ibr)"mcrgc(|8 n 



for < t sS t br , 

for tbr < t ^ tj.,5. 



*1.5— *br 



22 We note for reference that lGlade et al provide a brief 

review of the fundamentals of Poisson processes in their recent 
paper on the Drake equation. 



Here ti.g is used to denote the cosmological time between 
2 = 6 and 2 = 1.5 (the lower bound of our sample). The 
number of mergers experienced by an individual galaxy prior 
to observation for a given birth time is thus Poisson dis- 
tributed with rate, 



^ t birth 



A m (t) 



dt — r m (tobs) — Pm (tbirth), 



where r m (t) 

a merge Pme 



(7+1) 2-7 

^ merge Emerge ("T ~\~ 1 ) 



tZJ + 



K(2 — fj "br 
/3mcrgc(7+l) 

K7 ^br 
(.flmorgc-l) v 



for < t «S tbr 



for tbr < t s£ £1.5. 



E 1.5 

7+iciizl 

K \ 1-7 



7 



7(1-7) / 

That is, the probability of a galaxy experiencing k mergers 
between its epoch of birth and its epoch of observation is 



P(N m = fc|t birth ) = 



(V* 

\ r, 



) k e L m 



kl 



with two cases of particular interest being P(N m = 

0|i b irth) = e" r * and P(N m > Q|* h irth) = 1 - e~ r * . More- 
over, for the latter the (non-zero) N m = k mergers will be 
distributed uniformly in T m -space on [r m (tbirth), r m (i b a )]. 
The corresponding probability density of T m -epoch of the 
most recent merger is then simply that of the fc-th order 
statistic, 



/r m ,(fc)(r m )rfr„ 



(tbirth)) 

(rto 1 



-dT„ 



(remembering that = V m (t bs) ~ r m (tbirth)). Summation 
over all possible (non-zero) merger counts, k = 1, . . . , 00, 
weighted by their respective probabilites, P(N m = k), re- 
turns the overall probability density of the most recent 
merger r m -epoch as 



/ rr „(r m )dr m = 



E!Llfr m ,(k)(T m )P(N m = fc|tbirth) 



00 fc(r„ 1 -r m (t birth ))'° 



P(N m > Ojtbirth) 



VHOO 
2-lk = l 



(r*) 



-dTn 



<J r m -r m (i b irth) 



-dTn 



Once again transformation of variables delivers the form of 
this density back in the time domain. Namely, 



ft^{tm)dt 

n 



r m (t m )-r m (t 



birth) \ If \ 



dt „ 



e r*_i Ab(t m ) 

Likelihood of a Type III Late- Type Disk As de- 
scribed in Section \3 . 1 1 galaxies in our model may be born as 
either Type ill disks or Type IV ongoing mergers, with the 
probability of the latter set by 

P(Ci(tbirth) = IV|tbirth, 0, 0) = — - tj- blrth ^ . 

Ab(tbirth) 

We note explicitly here the conditional dependence on our 
suite of nuisance parameters, = {K, 7, W}; where this 
final element, which derives from the shape of the 2 ~ 1.5- 
3 stellar mass function, is taken as W * 0.5 + 0.2, i.e., 
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fw ~ WTrunc.(M = 0.5, cr = 0.2; W > 0). Since no trans- 
formation process permits a transition (back) to Type ill 
from outside this state (i.e., Type ill represents a transient 
class of the morphological type Markov chain; see Figure [3} 
the corresponding likelihood is the easiest to derive — it is 
merely the probability that a galaxy is born a Type ill disk 
and experiences neither merging nor secular evolution prior 
to observation. 

For a given tbirth (with < tbirth < t b s ) the probabil- 
ity of classification d = ill under our stochastic model for 
morphological transformation is thus 

Pi(Ci = IIl|tbirth, 0) 

CO 

= J]j P(d (tbirth) = in n N m = n S = 0|t birth , 0, 0) 
o 

fK.-ydKd'jfwdW 

00 

= J]j P(Gi(tbirth) = nilfwrth, 0, ©)P(iV m = 0\t bh . th , 0, 0) 
o 

P(S = 0|fbirth, 0, &)f Kn dKdjf w dW. 

Here P(S = 0|tbirth, 0, 0) represents the null secular evo- 
lution probability, which operates independently of the null 
merger probability and the probability of birth as a Type 
ill system. According to the Gamma-distributed form of the 
secular evolution timescale under our model, 

P(S = 0|tbirth, 0, ©) = Fbamma(l + 50T aoc cv ,50) (tobs — tbirth), 

and, as noted above, 

P(N m =0|t b irth, 0,0) = e- r *. 

Integration of this expression by tbirth over the correspond- 
ing density, /(, (tbirth), gives the general likelihood of Type 
ill morphology for the galaxy at hand, 



Pi(d = IIl]0) = 



f 

Jo 



Pi(Ci = IIl|tbirth, #)/f)dtbirth- 



Analytic solutions for this integral exist in a number of spe- 
cial cases, such as 7 = and 7 = 1 (the latter in terms of 
the error function), but not to our knowledge for arbitrary, 
non-integer 7 « 0.65 as required to fit the observed build up 
in number density above lO n M0 (see Figure [2]). To evalu- 
ate this expression within our MCMC code we thus employ 
the efficien t numerical technique of Monte Carlo integration 
(iLiulbOOlT ) at run-time. 

Likelihood of a Type I Spheroid In contrast to the 
simple case of the Type ill disk presented above there ex- 
ist an infinite variety of evolutionary pathways potentially 
leading to the production of a Type I spheroid under our 
model. However, these pathways may all be considered de- 
generate with respect to the instance of one final merger 
(possibly that of the galaxy's birth) followed by fading of 
all post-merger irregular features and settling of the merger 
remnant into a spheroid morphology (see Figure [3}. The 
probability of transition through this penultimate step is, of 
course, dependent upon the time of the final merger, t m . Fol- 
lowing the derivation above we write down the likelihood of 
Type 1 formation for the i-th galaxy with t b s corresponding 
to Zi as an integral over tbirth (and now t m also) as 



T'oba 

Pi(Ci = l|0) = Pi(Ci = l|0, tbirth)/i)dtbirth 

Jo 

with 



Pi(d = i\e,t b 



[P(N m > O|tbirth,6>,0) 



ftobs 
" ti-,;i-+v, 



l\t m ,0,®)P(E = l\0)f t r m dt n 



+ P(Ci (tbirth) = IV|tbirth, 9, Q)P(N m = 0|tbirth, 9, &) 

P(R = l|t bi rth,0,0)P(£ = l\9))f Kr/ dKdjf w dW. 

Here P(R = l|t m ,#, 0) = fGamma(l + 100T Irr morph.100) (^obs — 

tm) represents the probability of the post-merger irregular 
features fading to reveal the final remnant morphology prior 
to observation, and P(E = 1\9) = 1 - Ps P h+D remnant repre- 
sents the probability that the final remnant emerges a Type I 
pure spheroid rather than a Type II spheroid-plus-disk. The 
first half of this equation corresponds to the case of Type I 
classification at t b s for a galaxy which has experienced at 
least one merger since birth, while the second corresponds to 
the case of a sole, primal merger. Expansion of this expres- 
sion in full returns another integral with no simple closed 
form, so once again Monte Carlo integration is ultimately 
required for its evaluation. 

Likelihood of a Type IV Ongoing Merger The 
likelihood of observing a Type IV ongoing merger is eas- 
ily derived from the case of the Type I spheroid above with 
the trivial changes required being a complementation of the 
probability of the post-merger irregular features fading and 
removal of the binomial merger remnant type probability. 

Likelihood of a Type II Spheroid-plus-Disk As il- 
lustrated in Figure [3] there are in fact two non-degenerate 
pathways to formation of a Type II spheroid-plus-disk sys- 
tem under our model for high redshift morphological trans- 
formation, corresponding to merging or secular evolution al- 
ternately. So, in principle, writing down the likelihood of 
this type, Pi(d = n\0), should be the most difficult of all. 
However, having derived likelihoods for each of the other 
morphological types one may simply evaluate this case as 

Pi(Ci = u\9) =l-Pi(Ci = i\9) - P l {C l = m|0) 

-Pi(d = IV|0). 

Such an approach of course requires accurate evaluation of 
the likelihoods for all three alternative morphologies. Testa- 
ment to the robustness of our Monte Carlo integration ap- 
proach is the fact that upon setting Psph+D remnant = 0.5 
and Tscc cv = 100 Gyr, for which theoretically, Pi(d = 
l|0) = Pi(C'i = n|0), the computational agreement of these 
likelihoods (for a single galaxy) evaluated in R typically 
extends to at least the 4th significant figure (with just 
rime ~ 1000). 

Likelihood of the Full Dataset Recovering the like- 
lihood of the full observational dataset, P(y, 9) with y = 
{yi : (d,Zi)} (i = 1, . . . , iV ga i), in the "independent evolu- 
tion" case is then simply a matter of taking the product of 
likelihoods for each individual galaxy (as even neighbouring 
galaxies evolve uncoupled in this scenario). Thus, 



P(y\9) = Y\ P(d\9). 
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Of course, when the assumption of independence is relaxed 
(in order to build a more physically realistic model) in the 
manner of the "co-evolution" case considered in Section 
13.1.1.21 this full dataset likelihood function is no longer so 
readily tractable, and ABC methods thus become essential 
for reconstructing the posterior probability densities of the 
key input parameters. 

Markov Chain Monte Carlo Simulation Using the 
above expressions for P(y\6) one may easily generate the 
benchmark ( "tractable likelihood" ) posteriors shown in Fig- 
ure [4] via the (standard) random walk MCMC algorithm. 
That is, from the current state, Oi, propose a new state, 
Oi+i = Oi + S, by sampling S from some zero mean distri- 
bution, and accept this proposed state with probability 

^ +1 )P(y|fl T ) 

Here we employ the symmetric multivariate Normal distri- 
bution, Af(0, S), for S with S = S pr i or /5 chosen (by trial- 
and-error) to produce, on average, an ~40% acceptance rate 
for Qi+i. Running two separate threads of R on a 2.7 GHz 
dual core (4 GB ram), 13 inch Macbook Pro laptop we were 
able to verify satisfactory convergence of this chain after 
completing a target of 100,000 MCMC steps (with a 1,000 
step burn-in period on each thread) in a little less than 48 
hours. 
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